題目
你有一個(gè)目錄,放了你一個(gè)月的日記掏愁,都是 txt匿醒,為了避免分詞的問(wèn)題,假設(shè)內(nèi)容都是英文萍肆,請(qǐng)統(tǒng)計(jì)出你認(rèn)為每篇日記最重要的詞
代碼
"""
你有一個(gè)目錄袍榆,放了你一個(gè)月的日記,都是txt塘揣,為了避免分詞的問(wèn)題包雀,假設(shè)內(nèi)容都是英文,請(qǐng)統(tǒng)計(jì)出你認(rèn)為每篇日記最重要的詞亲铡。
"""
from collections import Counter
import os
def get_diary_path():
list = []
dir_path = './diary'
for path in os.listdir(dir_path):
list.append(dir_path + '/' + path)
return list
def get_common_word(paths):
common_words = []
for path in paths:
words = []
with open(path, 'r') as f: # 打開(kāi)文件
for line in f:
line = line.strip('\n') # 去掉每行后面的換行符
if line != '': # 去掉空行
split = str(line).split(' ')
words.extend(split)
common_word = Counter(words).most_common(1) # 獲取此篇中出現(xiàn)頻率最高的一個(gè)詞匯
common_words.append(common_word)
return common_words
if __name__ == '__main__':
paths = get_diary_path()
words = get_common_word(paths)
print(words)