FastText

論文：Bag of Tricks for Efficient Text Classification

1.Introduce

We evaluate the quality of our approach fastText1 on two different tasks, namely?tag prediction?and?sentiment analysis.

兩種評(píng)價(jià)方法:標(biāo)簽預(yù)測(cè)芭梯、情感分析

2.Model architecture

A simple and efficient baseline for sentence classification is to represent sentences as bag of words (BoW) and train a linear classifier, e.g., a logistic regression or an SVM

句子分類：使用詞袋模型 BoW表示句子示辈，然后訓(xùn)練線性分類器

However, linear classifiers do not share parameters among features and classes.This possibly limits their generalization in the context of large output space where some classes have very few examples. Common solutions to this problem are to factorize the linear classifier into low rank matrices or to use multilayer neural networks

線性分類器的缺點(diǎn)：不共享參數(shù)疫鹊，在輸出空間很大的情況下泛化能力較差

解決辦法：將線性分類器分解為低秩矩陣或者使用多層神經(jīng)網(wǎng)絡(luò)

FastText架構(gòu)

The first weight matrix A is a look-up table over the words.

The word representations are then averaged into a text representation, which is in turn fed to a linear classifier.

The text representation is an hidden variable which can be potentially be reused.

FastText模型和CBOW模型類似绍豁，CBOW是上下文單詞的詞向量平均去預(yù)測(cè)中心詞悬秉，fasttext是整個(gè)文檔的單詞的詞向量平均去預(yù)測(cè)標(biāo)簽

使用softmax模型計(jì)算分類的概率，使用negiative log-likelihood作為代價(jià)函數(shù)

代價(jià)函數(shù)

This model is trained asynchronously on multiple CPUs using stochastic gradient descent and a linearly decaying learning rate.

2.1 Hierarchical softmax

When the number of classes is large

基于哈夫曼編碼：每個(gè)節(jié)點(diǎn)與從根節(jié)點(diǎn)到該節(jié)點(diǎn)的概率有關(guān)

復(fù)雜度：O(kh) ---> O(h log2(k)) 程帕，where k is the number of classes and h the dimension of the text representation

優(yōu)勢(shì)：when searching for the most likely class

Each node is associated with a probability that is the probability of the path from the root to that node. If the node is at depth l+1 with parents n1, . . . , nl, its probability is

第l+1個(gè)節(jié)點(diǎn)的計(jì)算公式

一個(gè)節(jié)點(diǎn)的概率總是小于其父節(jié)點(diǎn)如失。DFS遍歷一棵樹，并且總是遍歷葉子節(jié)點(diǎn)中概率較大的那個(gè)。

This approach is further extended to compute the T-top targets at the cost of O(log(T)), using a binary heap.

將輸入層中的詞和詞組構(gòu)成特征向量储玫，再將特征向量通過(guò)線性變換映射到隱藏層侍筛，隱藏層通過(guò)求解最大似然函數(shù)，然后根據(jù)每個(gè)類別的權(quán)重和模型參數(shù)構(gòu)建Huffman樹撒穷，將Huffman樹作為輸出匣椰。?

FastText 也利用了類別（class）不均衡這個(gè)事實(shí)（一些類別出現(xiàn)次數(shù)比其他的更多），通過(guò)使用 Huffman 算法建立用于表征類別的樹形結(jié)構(gòu)端礼。因此禽笑，頻繁出現(xiàn)類別的樹形結(jié)構(gòu)的深度要比不頻繁出現(xiàn)類別的樹形結(jié)構(gòu)的深度要小，這也使得進(jìn)一步的計(jì)算效率更高蛤奥。?

Huffman 樹

2.2 N-gram features

詞袋模型（BoW）對(duì)于一個(gè)文本佳镜，忽略其詞序和語(yǔ)法，句法凡桥，將其僅僅看做是一個(gè)詞集合蟀伸，或者說(shuō)是詞的一個(gè)組合，文本中每個(gè)詞的出現(xiàn)都是獨(dú)立的缅刽，不依賴于其他詞是否出現(xiàn)啊掏。

we use a bag of n-grams as additional features to capture some partial information about the local word order.?

加入N-gram特征，以捕捉局部詞序衰猛；使用Hash-Trick方法降維

3 Experiments

First, we compare it to existing text classifers on the problem of sentiment analysis.

Then, we evaluate its capacity to scale to large output space on a tag prediction dataset.

①情感分析

②輸出空間很大標(biāo)簽預(yù)測(cè)

3.1 Sentiment analysis

Table 1

We present the results in Figure 1. We use 10 hidden units and run fastText for 5 epochs with a learning rate selected on a validation set from {0.05, 0.1, 0.25, 0.5}.

On this task,adding bigram information improves the performanceby 1-4%. Overall our accuracy is slightly better than char-CNN and char-CRNN and, a bit worse than VDCNN.

Note that we can increase the accuracy slightly by using more n-grams, for example with trigrams.

Table 3

We tune the hyperparameters on the validation set and observe that using n-grams up to 5 leads to the best performance.

3.2 Tag prediction

To test scalability of our approach, further evaluation is carried on the YFCC100M dataset?which consists of almost 100M images with captions,titles and tags. We focus on predicting the tags according to the title and caption (we do not use the images).

We remove the words and tags occurring less than 100 times and split the data into a train, validation and test set.?

We consider a frequency-based baseline whichpredicts the most frequent tag.? we consider the linear version.

Table 5

We run fastText for 5 epochs and compare it to Tagspace for two sizes of the hidden layer, i.e., 50 and 200. Both models achieve a similar performance with a small hidden layer, but adding bigrams gives us a significant boost in accuracy.

At test time, Tagspace needs to compute the scores for all the classes which makes it relatively slow,while our fast inference gives a significant speed-up when the number of classes is large (more than 300K here).

Overall, we are more than an order of magnitude faster to obtain model with a better quality.?

4 Discussion and conclusion

Unlike unsupervisedly trained word vectors from word2vec, our word features can?be averaged together to form good sentence representations.

In several tasks, fastText obtains performance on par with recently proposed methods inspired by deep learning, while being much faster.

Although deep neural networks have in theory much higher representational power than shallow models, it is not clear if simple text classification problems such as sentiment analysis are the right ones to evaluate them.

輸入是一句話迟蜜，x1到xN是這句話的單詞或是ngram。每一個(gè)都對(duì)應(yīng)一個(gè)向量啡省，對(duì)這些向量取平均就得到了文本向量娜睛。然后用文本向量去預(yù)測(cè)標(biāo)簽。當(dāng)類別不多的時(shí)候卦睹，就是最最簡(jiǎn)單的softmax畦戒。當(dāng)標(biāo)簽數(shù)量巨大的時(shí)候，就是要用到hierarchical softmax了分预。由于這個(gè)文章除了詞向量還引入了ngram向量兢交，ngram的數(shù)量非常大，會(huì)導(dǎo)致參數(shù)很多笼痹。所以這里使用了哈希桶配喳，會(huì)可能把幾個(gè)ngram映射到同一個(gè)向量。這樣會(huì)大大的節(jié)省內(nèi)存

word2vec和fasttext的對(duì)比：

word2vec對(duì)局部上下文中的單詞的詞向量取平均凳干，預(yù)測(cè)中心詞晴裹；fasttext對(duì)整個(gè)句子（或是文檔）的單詞的詞向量取平均，預(yù)測(cè)標(biāo)簽救赐。

word2vec中不使用正常的softmax涧团，因?yàn)橐A(yù)測(cè)的單詞實(shí)在是太多了只磷。word2vec中可以使用hierarchical softmax或是negative sampling。fasttext中當(dāng)標(biāo)簽數(shù)量不多的時(shí)候使用正常的softmax泌绣，在標(biāo)簽數(shù)量很多的時(shí)候用hierarchical softmax钮追。fasttext中不會(huì)使用negative sampling是因?yàn)閚egative sampling得到的不是嚴(yán)格的概率。

補(bǔ)充知識(shí)：

Negative log-likelihood function

詞袋模型與Hash-Trick

代碼：

After embed each word in the sentence, this word representations are then averaged into a text representation, which is in turn fed to a?linear classifier.

It use?softmax function?to compute the probability distribution over the predefined classes.

Then?cross entropy?is used to compute loss.

Bag of word representation does not consider word order.?

In order to take account of word order,?n-gram features?is used to capture some partial information about the local word order

When the number of classes is large, computing the linear classifier is computational expensive. So it use?hierarchical softmax?to speed training process.

????use bi-gram and/or tri-gram

????use NCE loss to speed us softmax computation(not use hierarchy softmax as original paper)

訓(xùn)練模型

1.load data(X:list of lint,y:int).?

2.create session.?

3.feed data.?

4.training

?(5.validation)?

(6.prediction)

最后編輯于：2018.04.08 10:14:37

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末阿迈，一起剝皮案震驚了整個(gè)濱河市元媚，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌苗沧，老刑警劉巖刊棕，帶你破解...
沈念sama閱讀 207,248評(píng)論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場(chǎng)離奇詭異待逞，居然都是意外死亡甥角，警方通過(guò)查閱死者的電腦和手機(jī)，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,681評(píng)論 2贊 381
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門识樱，熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)嗤无，“玉大人，你說(shuō)我怎么就攤上這事牺荠∥涛。” “怎么了？”我有些...
開(kāi)封第一講書人閱讀 153,443評(píng)論 0贊 344
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵休雌，是天一觀的道長(zhǎng)。經(jīng)常有香客問(wèn)我肝断，道長(zhǎng)杈曲，這世上最難降的妖魔是什么？我笑而不...
開(kāi)封第一講書人閱讀 55,475評(píng)論 1贊 279
?港島之戀（遺憾婚禮）
正文為了忘掉前任胸懈，我火速辦了婚禮担扑，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘趣钱。我一直安慰自己涌献，他們只是感情好，可當(dāng)我...
茶點(diǎn)故事閱讀 64,458評(píng)論 5贊 374
惡毒庶女頂嫁案：這布局不是一般人想出來(lái)的
文/花漫我一把揭開(kāi)白布首有。她就那樣靜靜地躺著燕垃，像睡著了一般。火紅的嫁衣襯著肌膚如雪井联。梳的紋絲不亂的頭發(fā)上卜壕，一...
開(kāi)封第一講書人閱讀 49,185評(píng)論 1贊 284
城市分裂傳說(shuō)
那天，我揣著相機(jī)與錄音烙常，去河邊找鬼轴捎。笑死，一個(gè)胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的侦副。我是一名探鬼主播侦锯，決...
沈念sama閱讀 38,451評(píng)論 3贊 401
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開(kāi)眼，長(zhǎng)吁一口氣：“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼秦驯！你這毒婦竟也來(lái)了尺碰？” 一聲冷哼從身側(cè)響起，我...
開(kāi)封第一講書人閱讀 37,112評(píng)論 0贊 261
萬(wàn)榮殺人案實(shí)錄
序言：老撾萬(wàn)榮一對(duì)情侶失蹤汇竭，失蹤者是張志新（化名）和其女友劉穎葱蝗，沒(méi)想到半個(gè)月后，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體细燎，經(jīng)...
沈念sama閱讀 43,609評(píng)論 1贊 300
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡两曼，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 36,083評(píng)論 2贊 325
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了玻驻。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片悼凑。...
茶點(diǎn)故事閱讀 38,163評(píng)論 1贊 334
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡，死狀恐怖璧瞬，靈堂內(nèi)的尸體忽然破棺而出户辫，到底是詐尸還是另有隱情，我是刑警寧澤嗤锉，帶...
沈念sama閱讀 33,803評(píng)論 4贊 323
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布渔欢，位于F島的核電站，受9級(jí)特大地震影響瘟忱，放射性物質(zhì)發(fā)生泄漏奥额。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,357評(píng)論 3贊 307
男人毒藥：我在死后第九天來(lái)索命
文/蒙蒙一访诱、第九天我趴在偏房一處隱蔽的房頂上張望垫挨。院中可真熱鬧，春花似錦触菜、人聲如沸九榔。這莊子的主人今日做“春日...
開(kāi)封第一講書人閱讀 30,357評(píng)論 0贊 19
一樁弒父案涡相，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)哲泊。三九已至，卻和暖如春漾峡，著一層夾襖步出監(jiān)牢的瞬間攻旦，已是汗流浹背。一陣腳步聲響...
開(kāi)封第一講書人閱讀 31,590評(píng)論 1贊 261
情欲美人皮
我被黑心中介騙來(lái)泰國(guó)打工生逸，沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留牢屋，地道東北人且预。一個(gè)月前我還...
沈念sama閱讀 45,636評(píng)論 2贊 355
代替公主和親
正文我出身青樓，卻偏偏與公主長(zhǎng)得像烙无，于是被迫代替她去往敵國(guó)和親锋谐。傳聞我的和親對(duì)象是個(gè)殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 42,925評(píng)論 2贊 344