Several mechanisms to focus attention of a neural network on selected parts of its input or memory have been used successfully in deep learning models in recent years. Attention has improved image classification, image captioning, speech recognition, generative models, and learning algorithmic tasks, but it had probably
the largest impact on neural machine translation.
Recently, similar improvements have been obtained using alternative mechanisms that do not focus on a single part of a memory but operate on all of it in parallel, in a uniform way. Such mechanism, which we call active memory, improved over attention in algorithmic tasks, image processing, and in generative modelling.
So far, however, active memory has not improved over attention for most natural language processing tasks, in particular for machine translation. We analyze this shortcoming in this paper and propose an extended model of active memory that matches existing attention models on neural machine translation and generalizes better to longer sentences. We investigate this model and explain why previous active memory models did not succeed. Finally, we discuss when active memory brings most benefits and where attention can be a better choice.
注意力機(jī)制是一項(xiàng)把神經(jīng)網(wǎng)絡(luò)的注意力聚焦在輸入或者記憶的選出來(lái)的那部分上的技術(shù)叠赐,在深度學(xué)習(xí)中取得了很好的效果翠霍。對(duì)圖像分類(lèi)试伙,圖像描述颂鸿,語(yǔ)音識(shí)別妥凳,生成模型和學(xué)習(xí)算法任務(wù)都有一定的性能提升效果拇囊,在神經(jīng)網(wǎng)絡(luò)機(jī)器翻譯中的提升效果最為明顯幻赚。
現(xiàn)在類(lèi)似的提升可以通過(guò)另外一種機(jī)制達(dá)到赊窥,這種方法并不會(huì)聚焦于記憶的一個(gè)單獨(dú)的部分爆惧,而是對(duì)記憶所有的部分依照一種統(tǒng)一的方式并行操作。這里我們稱(chēng)之為主動(dòng)記憶(active memory)锨能,這在算法任務(wù)扯再,圖像處理和生成式建模上得到了比注意力機(jī)制更好的效果。
而現(xiàn)在址遇,主動(dòng)記憶并沒(méi)有對(duì)大多數(shù)的自然語(yǔ)言處理任務(wù)有效果提升熄阻,尤其是機(jī)器翻譯問(wèn)題。我們分析了這個(gè)原因倔约,并提供了一種擴(kuò)展的主動(dòng)記憶的模型秃殉,得到了與神經(jīng)網(wǎng)絡(luò)機(jī)器翻譯的相當(dāng)?shù)男Ч⑶覍?duì)更長(zhǎng)的句子有更好的泛化效果浸剩。所以對(duì)此模型進(jìn)行了探究并解釋了為何早先的主動(dòng)記憶模型失敗的原因钾军。最后,探討了主動(dòng)記憶技術(shù)帶來(lái)的好處绢要,也指出那些注意力機(jī)制優(yōu)越的應(yīng)用吏恭。