2020年春節(jié)悬赏,冠狀病毒在中國(guó)的蔓延不容忽視树肃,大大小小的新聞媒體相繼進(jìn)行報(bào)道蒸矛。太多的消息讓人越看越煩躁,不如通過(guò)詞云進(jìn)行關(guān)鍵字的整理,幫著梳理下疫情的脈絡(luò)雏掠。
1斩祭、在各大新聞平臺(tái)中,copy下20余篇相關(guān)新聞到本地作為數(shù)據(jù)源乡话;
2摧玫、通過(guò)kumo的詞云功能生成詞云(基于JAVA開(kāi)發(fā));
3绑青、對(duì)詞云結(jié)果進(jìn)行分析诬像;
import com.kennycason.kumo.CollisionMode;
import com.kennycason.kumo.WordCloud;
import com.kennycason.kumo.WordFrequency;
import com.kennycason.kumo.bg.RectangleBackground;
import com.kennycason.kumo.font.KumoFont;
import com.kennycason.kumo.font.scale.LinearFontScalar;
import com.kennycason.kumo.nlp.FrequencyAnalyzer;
import com.kennycason.kumo.nlp.filter.Filter;
import com.kennycason.kumo.nlp.tokenizers.ChineseWordTokenizer;
import com.kennycason.kumo.palette.LinearGradientColorPalette;
import java.awt.*;
import java.io.IOException;
import java.util.List;
public class WordCloudTest {
public static void main(String[] args) throws IOException {
//建立詞頻分析器,設(shè)置詞頻闸婴,以及詞語(yǔ)最短長(zhǎng)度坏挠,此處的參數(shù)配置視情況而定即可
FrequencyAnalyzer frequencyAnalyzer = new FrequencyAnalyzer();
//最多展示關(guān)鍵字?jǐn)?shù)量
frequencyAnalyzer.setWordFrequenciesToReturn(500);
Filter filter = new Filter() {
@Override
public boolean test(String s) {
if (s.matches("^\\d+$")) {
return false;
}
return true;
}
};
frequencyAnalyzer.setFilter(filter);
//關(guān)鍵字重復(fù)出現(xiàn)閾值
frequencyAnalyzer.setMinWordLength(3);
//引入中文解析器
frequencyAnalyzer.setWordTokenizer(new ChineseWordTokenizer());
final List<WordFrequency> wordFrequencies = frequencyAnalyzer.load("D:\\news.txt");
//初始化畫(huà)板
Dimension dimension = new Dimension(800, 600);
// WordCloud wordCloud = new WordCloud(dimension, CollisionMode.RECTANGLE);
WordCloud wordCloud = new WordCloud(dimension, CollisionMode.RECTANGLE);
wordCloud.setPadding(0);
java.awt.Font font = new java.awt.Font("STSong-Light", 5, 100);
wordCloud.setBackgroundColor(new Color(255, 255, 255));
wordCloud.setKumoFont(new KumoFont(font));
wordCloud.setBackground(new RectangleBackground(dimension));
// wordCloud.setBackground(new CircleBackground(255));
wordCloud.setColorPalette(new LinearGradientColorPalette(Color.gray, Color.GREEN, 300));
wordCloud.setFontScalar(new LinearFontScalar(20, 100));
wordCloud.build(wordFrequencies);
wordCloud.writeToFile("D:\\news.png");
}
}
詞云結(jié)果
news.png
通過(guò)詞云結(jié)果分析,字體越大代表關(guān)鍵字命中率越高掠拳。在這些信息里癞揉,摘取自己關(guān)注的關(guān)鍵字纸肉,再通過(guò)谷歌溺欧、度娘查詢(xún)具體的內(nèi)容,也不至于像一只沒(méi)頭蒼蠅一樣柏肪,看著林林總總的信息越看越恐慌姐刁。
PS:數(shù)據(jù)源一定是越多,命中的結(jié)果越準(zhǔn)確烦味,感興趣的同學(xué)可以自行copy或者通過(guò)爬蟲(chóng)來(lái)爬取相關(guān)信息聂使,得到自己想要的結(jié)果。
感興趣的同學(xué)記得點(diǎn)贊哦~