文章來源:IJCAI-16
based on unsupervised learning of distributed representations of words and dependency paths.
基本idea:在依存空間中,通過依存路徑連接兩個詞
在低維空間優(yōu)化了w1+r約等于w2,而多條依存路徑被當做a sequence of grammatical relations and modeled by a recurrent neural network《Э鳎考慮線性上下文和依存上下文信息的embedding features百新,基于CRF的aspect term extraction悲雳。
結(jié)果:1) 在單embedding features的情況下,有好的結(jié)果 2) 在word yields增加句法信息(syntactic information)有更好的表現(xiàn)阻逮。
主流的方法:1) The unsupervised(or rule based) methods rely on a set of manually defined opinion words as seeds and rules derived from syntactic parsing trees to iteratively extract aspect terms. 無監(jiān)督方法容诬,依賴手動定義的opinion詞和通過句法樹學習的規(guī)則。 2)The supervised methods將ATE問題看做a sequence labeling problem沿腰,并且conditional random field(CRF)是主流的方法览徒。
representation learning:1) word embeddings 2) structured embeddings of knowledge bases
本文: focus on representation learning for aspect term extraction under an unsupervised framework. 通過學習distributed representations of words and dependency paths from the text corpus.?
The learned embeddings of words and dependency paths are utilized as features in CRF for aspect term extraction.
問題:The embeddings are real values that are not necessarily in a bounded range.
本文:首先map the continuous embeddings into the discrete embeddings and make them more appropriate for the CRF model.將連續(xù)的embeddings map到分離的embeddings。 然后颂龙,構(gòu)建embeddings features包括the target word embeddings习蓬,線性上下文embedding和dependency context embedding for aspect term extraction。
Related Work:
無監(jiān)督學習:關聯(lián)規(guī)則挖掘association rule mining措嵌,除此之外躲叼,使用opinion words來提取不頻繁的aspect terms。 dependency relation is used as crucial clue企巢,double propagation method雙傳輸方法可以迭代的提取aspect terms和opinion words枫慷。
監(jiān)督學習:主流方法還是CRF。Li et al.[2010]提出了一個新的在CRF上的機器學習框架浪规,結(jié)合extract positive opinion words或听,negative opinion words和Aspect terms。
dependency paths:包含豐富的詞語間的語言信息
本文:learn the semantic composition of dependency paths over dependency trees.
Method:
首先從dependency trees提取triple(w1, w2, r)笋婿,w1和w2是兩個詞誉裆,the corresponding dependency path r是從w1到w2的最短路徑并且包括a sequence of grammatical relations.
We notice that considering the lexicalized dependency paths can provide more information for the embedding learning.但是,需要記住更多的dependency path frequencies for the learning method(負采樣)缸濒。dependency paths是(考慮n-hop dependency paths)
|Vword|是words集的個數(shù)足丢,大于十萬個,Vdep是語法關系集庇配,|Vdep|大約是50
損失函數(shù):
C1表示從dependency trees提取的三元組斩跌,dependency trees從text corpus提取,r是a sequence of grammatical relations讨永,(g1, g2, ..., gn)滔驶,n是r的hop number,gi是r中第i個語法關系卿闹,并且p(r)是r的邊緣分布揭糕。損失函數(shù)確保三元組(w1, w2, r)有更高的排序分數(shù)萝快,比隨機挑選的三元組(w1, w2, r')。ranking score衡量:inner product of vector r/r' 和 vector w2-w1著角。
讓Recurrent neural network學習the compositional representations(組合表示) for multi-hop dependency paths. 組合運算通過矩陣W實現(xiàn):
f是一個hard hyperbolic tangent function(hTanh)揪漩, [a;b]是一個兩個向量的連接,gi是gi的embedding吏口。設置h1=g1然后迭代composition operation得到最后的r=hn奄容。hop number是小于等于3的,因為設置更大會很費時間产徊。
Multi-task learning with linear context:
線性上下文昂勒,基于distributional hypothesis分布假設,假設在相似上下文的詞有相似的意義舟铜。inspired by Skip-gram戈盈,enhance word embeddings 通過最大化prediction accuracy of context word c that occurs in the linear context of a target word w。每個詞有兩種角色谆刨,the target word and the context word of other target words.
模型訓練:
負采樣用于訓練embedding model
Aspect Term Extraction with Embeddings:
CRF