讀寫(xiě)文件
把待讀取的文本存在info.txt中,content類(lèi)型為str
with open('info.txt', 'r', encoding="UTF-8") as file1: # with as操作讀取文件很ok
content = "".join(file1.readlines())
待寫(xiě)入文件為 output.txt这刷,content_after為待寫(xiě)入字符串
with open('output.txt', 'w', encoding='utf-8') as file2:
file2.write(content_after+"\n")
分詞
# 調(diào)用jieba.cut
sentence_seged = jieba.cut(content)
去除停用詞
- 建立停用詞表
將停用詞表放在stop.txt中绸吸,一行一個(gè)詞# stopwords為停用詞list stopwords = [line.strip() for line in open('stop.txt', 'r', encoding='utf-8').readlines()]
- 遍歷去除停用詞
outstr = '' # 待返回字符串 for word in sentence_seged: if word not in stopwords: outstr += word + " "
生成詞云圖
images = Image.open("something.png") # 打開(kāi)保存的圖片
maskImages = np.array(images) # 并用numpy轉(zhuǎn)換
wc = WordCloud(font_path="msyh.ttc", background_color="white", max_words=100, max_font_size=100).generate(content_after) # 生成詞云圖
wc.to_file('wordCloudPic.png') # 保存到本地圖片文件