最近通過Spark Streaming消費Kafka數(shù)據(jù)鱼炒,消費的數(shù)據(jù)落到hdfs,一分鐘一個小文件晰甚,昨天架構(gòu)那邊的同事告訴我要清理歷史文件,但是目錄太多决帖,手動刪比較慢厕九,于是想到可以把文件目錄都拿到,寫入文本 path_to_clean.txt地回,通過shell循環(huán)讀路徑止剖,并執(zhí)行刪除。
hdfs://nameservice1/user/hadoop/dw_realtime/dw_real_for_path_list/mb_pageinfo_hash2/date=2016-01-09
hdfs://nameservice1/user/hadoop/dw_realtime/dw_real_for_path_list/mb_pageinfo_hash2/date=2016-07-05
hdfs://nameservice1/user/hadoop/dw_realtime/dw_real_for_path_list/mb_pageinfo_hash2/date=2016-09-05
hdfs://nameservice1/user/hadoop/dw_realtime/dw_real_for_path_list/mb_pageinfo_hash2/date=2016-10-20
hdfs://nameservice1/user/hadoop/dw_realtime/dw_real_for_path_list/mb_pageinfo_hash2/date=2016-11-06
hdfs://nameservice1/user/hadoop/dw_realtime/dw_real_for_path_list/mb_pageinfo_hash2/date=2016-11-07
...
已下是代碼落君。
方式一(我才采用的是這種)
#!/bin/bash
for line in `cat path_to_clean.txt`
do
echo $line
hadoop fs -rm -r $line
done
方式二
#!/bin/bash
while read line
do
echo $line
done < path_to_clean.txt
方式三
#!/bin/bash
cat path_to_clean.txt | while read line
do
echo $line
done