python 中文，英文做詞頻統(tǒng)計(jì)小計(jì)

作為一個(gè)爬蟲(chóng)工程師抖誉，詞頻統(tǒng)計(jì)還是要有所了解的殊轴，對(duì)于輿情的文本處理，統(tǒng)計(jì)每個(gè)詞出現(xiàn)的次數(shù)袒炉，亦或是統(tǒng)計(jì)文本出現(xiàn)top10詞旁理，為以后簡(jiǎn)單的數(shù)據(jù)分析，做一點(diǎn)點(diǎn)準(zhǔn)備我磁。那么我們開(kāi)始來(lái)處理吧孽文。

import re

text = '''Which year will be the turning point for the world's most populous country in which its population experiences negative growth? Chinese demographers differ in their answers.
Experts with the Chinese Academy of Social Sciences estimate the turning point could arrive around 2028 after the population peaks to 1.44 billion, says the Green Book of Population and Labor co-released by the Chinese Academy of Social Sciences and Social Sciences Academic Press on Thursday. 
However, Huang Wenzheng, a demographics expert, told the Global Times on Friday that this estimate is too optimistic. He estimated the year 2024 or 2025 will be the threshold for population negative growth.
According to Huang, the prediction in the green book is based on the fertility rate that could remain at 1.6, which is hard to realize. 
In 2016, China's fertility rate was 1.7, but in 2017, the number of births was less, according to media reports. 
The births in 2016 and 2017 were high compared to years before, said Huang. "This was due to the introduction of two-child policy for all families [in 2016] which encouraged those who had the willingness to have a second child before the policy. So they hastened to give birth in these two years."
"But the overall trend is that people are no longer willing to have more children."
Huang elaborated that people's concept of raising children has changed. Urban people care about quality, rather than quantity. "They want to provide the best resources they have to bring up their children. This won't be possible if they have several," he said. 
With rapid urbanization, many people from rural areas come to work in the city and also follow this practice. 
"Previously people thought that having two or three children is normal. But now they are accustomed to having only one child. They find this normal," Huang said.
Yi Fuxian, a research fellow at the University of Wisconsin-Madison, holds a more pessimistic view. He told the Global Times that 2018 has seen negative growth based on his own research and analysis. 
Both Yi and Huang believe that China will abandon the two-child policy this year, putting an end to family planning, in order to stimulate births. They also warned that the sharp decline in population could have negative influence on the economy.
China has introduced a series of new measures to stimulate fertility. This year, the country's tax cuts also favor families with children. Families are able to deduct 12,000 yuan ($1,748) a year from their taxable income for children's education.
Huang said this is still far from enough. He suggested the government provide free upbringing of children aged 0 to 3 and make kindergarten education compulsory to further ease the burden of educating children. 


'''
# 詞頻統(tǒng)計(jì)
def word_count(string):
    if isinstance(string, str):
        new_text = string.strip()
        str_list = re.split('\s+', new_text)
        word_dict = {}
        for str_word in str_list:
            if str_word in word_dict.keys():#如果key存在則value加1
                word_dict[str_word] = word_dict[str_word] + 1
            else:
                word_dict[str_word] = 1
        return word_dict
    else:
        raise 'Please enter a string'


word = word_count(string=text)
#print(word)

# 詞頻統(tǒng)計(jì)按降序排序取前10
word_list = sorted(word .items(), key=lambda x: x[1], reverse=True)[0:11]
print(word_list)

image.png

如上圖統(tǒng)計(jì)文本top10詞匯出現(xiàn)的詞語(yǔ)驻襟，以及次數(shù)。

以上是英文詞頻統(tǒng)計(jì)芋哭，下面我們看看中文文本怎么統(tǒng)計(jì)吧沉衣。

首先中文統(tǒng)計(jì)我們需要下載一個(gè)第三方庫(kù)jieba分詞。
安裝 pip install jieba
處理文本分詞
import jieba
content_text ='''然而减牺，我們并沒(méi)有時(shí)間去探索數(shù)據(jù)集中的數(shù)千個(gè)案例豌习。我們應(yīng)該做的則是在測(cè)試案例的典型范例上繼續(xù)運(yùn)行LIME，看看哪些詞的占有率仍能位居前列拔疚。通過(guò)這種方法肥隆，我們可以獲得像以前模型那樣的單詞的重要性分?jǐn)?shù)，并驗(yàn)證模型的預(yù)測(cè)'''

def get_(string):
    b = list(jieba.cut(string, cut_all=True))
    dict = {}
    for str in b:
        if str != '' and str != '\n':#去除空白字符稚失，和換行符栋艳。
            if str in dict.keys():
                dict[str] = dict[str] + 1
            else:
                dict[str] = 1
    return dict

word = get_(string=content_text )
#取前十top10詞匯
word_list = sorted(word .items(), key=lambda x: x[1], reverse=True)[0:11]
print(word_list)

image.png

這是中文版詞頻統(tǒng)計(jì)結(jié)果截圖。

好了墩虹，今天小結(jié)到這里就完了嘱巾，有興趣的小伙伴，可以私信我诫钓，

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末旬昭，一起剝皮案震驚了整個(gè)濱河市，隨后出現(xiàn)的幾起案子菌湃，更是在濱河造成了極大的恐慌问拘，老刑警劉巖，帶你破解...
沈念sama閱讀 217,185評(píng)論 6贊 503
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件惧所，死亡現(xiàn)場(chǎng)離奇詭異骤坐，居然都是意外死亡，警方通過(guò)查閱死者的電腦和手機(jī)下愈，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,652評(píng)論 3贊 393
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門纽绍，熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)，“玉大人势似，你說(shuō)我怎么就攤上這事拌夏。” “怎么了履因？”我有些...
開(kāi)封第一講書人閱讀 163,524評(píng)論 0贊 353
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵障簿，是天一觀的道長(zhǎng)。經(jīng)常有香客問(wèn)我栅迄，道長(zhǎng)站故，這世上最難降的妖魔是什么？我笑而不...
開(kāi)封第一講書人閱讀 58,339評(píng)論 1贊 293
?港島之戀（遺憾婚禮）
正文為了忘掉前任毅舆，我火速辦了婚禮西篓，結(jié)果婚禮上愈腾，老公的妹妹穿的比我還像新娘。我一直安慰自己岂津，他們只是感情好顶滩，可當(dāng)我...
茶點(diǎn)故事閱讀 67,387評(píng)論 6贊 391
惡毒庶女頂嫁案：這布局不是一般人想出來(lái)的
文/花漫我一把揭開(kāi)白布。她就那樣靜靜地躺著寸爆，像睡著了一般。火紅的嫁衣襯著肌膚如雪盐欺。梳的紋絲不亂的頭發(fā)上赁豆，一...
開(kāi)封第一講書人閱讀 51,287評(píng)論 1贊 301
城市分裂傳說(shuō)
那天，我揣著相機(jī)與錄音冗美，去河邊找鬼魔种。笑死，一個(gè)胖子當(dāng)著我的面吹牛粉洼，可吹牛的內(nèi)容都是我干的节预。我是一名探鬼主播，決...
沈念sama閱讀 40,130評(píng)論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開(kāi)眼属韧，長(zhǎng)吁一口氣：“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼安拟！你這毒婦竟也來(lái)了？” 一聲冷哼從身側(cè)響起宵喂，我...
開(kāi)封第一講書人閱讀 38,985評(píng)論 0贊 275
萬(wàn)榮殺人案實(shí)錄
序言：老撾萬(wàn)榮一對(duì)情侶失蹤糠赦，失蹤者是張志新（化名）和其女友劉穎，沒(méi)想到半個(gè)月后锅棕，有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體拙泽，經(jīng)...
沈念sama閱讀 45,420評(píng)論 1贊 313
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,617評(píng)論 3贊 334
?白月光啟示錄
正文我和宋清朗相戀三年裸燎，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了顾瞻。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點(diǎn)故事閱讀 39,779評(píng)論 1贊 348
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡德绿，死狀恐怖荷荤，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情脆炎，我是刑警寧澤梅猿，帶...
沈念sama閱讀 35,477評(píng)論 5贊 345
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站秒裕，受9級(jí)特大地震影響袱蚓，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜几蜻，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,088評(píng)論 3贊 328
男人毒藥：我在死后第九天來(lái)索命
文/蒙蒙一喇潘、第九天我趴在偏房一處隱蔽的房頂上張望体斩。院中可真熱鬧，春花似錦颖低、人聲如沸絮吵。這莊子的主人今日做“春日...
開(kāi)封第一講書人閱讀 31,716評(píng)論 0贊 22
一樁弒父案忱屑，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)蹬敲。三九已至，卻和暖如春莺戒，著一層夾襖步出監(jiān)牢的瞬間伴嗡，已是汗流浹背。一陣腳步聲響...
開(kāi)封第一講書人閱讀 32,857評(píng)論 1贊 269
情欲美人皮
我被黑心中介騙來(lái)泰國(guó)打工从铲，沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留瘪校，地道東北人。一個(gè)月前我還...
沈念sama閱讀 47,876評(píng)論 2贊 370
代替公主和親
正文我出身青樓名段，卻偏偏與公主長(zhǎng)得像阱扬，于是被迫代替她去往敵國(guó)和親。傳聞我的和親對(duì)象是個(gè)殘疾皇子伸辟，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 44,700評(píng)論 2贊 354

python 中文废菱，英文做詞頻統(tǒng)計(jì)小計(jì)

python 中文，英文做詞頻統(tǒng)計(jì)小計(jì)

以上是英文詞頻統(tǒng)計(jì)芋哭，下面我們看看中文文本怎么統(tǒng)計(jì)吧沉衣。

推薦閱讀更多精彩內(nèi)容