2018-NAACL- Bi-model based RNN Semantic Frame Parsing Model for Intent

In this paper, new Bi-model based RNN semantic frame parsing network structures are designed to perform the intent detection and slot filling tasks jointly, by considering their cross-impact to each other using two correlated bidirectional LSTMs (BLSTM).
本文利用兩種關(guān)聯(lián)的雙向LSTMs(BLSTM)钞钙,設(shè)計(jì)了一種基于RNN語義框架解析網(wǎng)絡(luò)結(jié)構(gòu)的新型雙模框架解析網(wǎng)絡(luò)結(jié)構(gòu),從而實(shí)現(xiàn)了意圖檢測和槽填充任務(wù)笛谦。

Abstract 摘要

Intent detection and slot filling are two main tasks for building a spoken language understanding (SLU) system.
意圖檢測和插槽填充是構(gòu)建口語理解(SLU)系統(tǒng)的兩個(gè)主要任務(wù)对蒲。

Multiple deep learning based models have demonstrated good results on these tasks .
多個(gè)基于深度學(xué)習(xí)的模型在這些任務(wù)上已經(jīng)顯示出了良好的結(jié)果。

The most effective algorithms are based on the structures of sequence to sequence models (or ”encoder-decoder” models), and generate the intents and semantic tags either using separate models((Yao et al., 2014; Mesnil et al., 2015; Peng and Yao, 2015; Kurata et al., 2016; Hahn et al., 2011)) or a joint model ((Liu and Lane, 2016a; Hakkani-T¨ur et al., 2016; Guo et al., 2014)).
最有效的算法是基于序列模型(或“編碼-解碼”模型)的結(jié)構(gòu)还最,并使用單獨(dú)的模型生成意圖和語義標(biāo)記(姚等人芭逝,2014;Mesnil等人,2015年;彭和姚舆驶,2015;庫拉塔等人橱健,2016年;哈恩等人,2011年))或聯(lián)合模型(劉和萊恩沙廉,2016a;Hakkani-T ur等人拘荡,2016年;郭等人,2014年))撬陵。

Most of the previous studies, however, either treat the intent detection and slot filling as two separate parallel tasks, or use a sequence to sequence model to generate both semantic tags and intent.
然而珊皿,以前的大多數(shù)研究都將意圖檢測和槽填充作為兩個(gè)獨(dú)立的并行任務(wù)處理,或者使用序列來生成語義標(biāo)簽和意圖巨税。

Most of these approaches use one (joint) NN based model (including encoderdecoder structure) to model two tasks, hence may not fully take advantage of the cross impact between them.
這些方法中的大多數(shù)使用一個(gè)(聯(lián)合)基于NN的模型(包括編碼解碼器結(jié)構(gòu))來對兩個(gè)任務(wù)進(jìn)行建模蟋定,因此可能無法充分利用它們之間的交叉影響。

In this paper, new Bi-model based RNN semantic frame parsing network structures are designed to perform the intent detection and slot filling tasks jointly, by considering their cross-impact to each other using two correlated bidirectional LSTMs (BLSTM).</br>
本文利用兩種關(guān)聯(lián)的雙向LSTMs(BLSTM)草添,設(shè)計(jì)了一種基于RNN語義框架解析網(wǎng)絡(luò)結(jié)構(gòu)的新型雙氖欢担框架解析網(wǎng)絡(luò)結(jié)構(gòu),從而實(shí)現(xiàn)了意圖檢測和槽填充任務(wù)远寸。

Our Bi-model structure with a decoder achieves state-of-the-art result on the benchmark ATIS data (Hemphill et al., 1990; Tur et al., 2010), with about 0.5% intent accuracy improvement and 0.9 % slot filling improvement.
我們的雙模型結(jié)構(gòu)和一個(gè)解碼器實(shí)現(xiàn)了對基準(zhǔn)的ATIS數(shù)據(jù)的最先進(jìn)的結(jié)果(Hemphill等抄淑,1990;圖爾等人,2010年)而晒,大約0.5%的意圖精度提高和0.9%的插槽填充改進(jìn)蝇狼。

1 Introduction 介紹

The research on spoken language understanding (SLU) system has progressed extremely fast during the past decades.
在過去的幾十年里,對口語理解(SLU)系統(tǒng)的研究進(jìn)展非常迅速倡怎。

Two important tasks in an SLU system are intent detection and slot filling.
SLU系統(tǒng)中的兩個(gè)重要任務(wù)是意圖檢測和插槽填充。

These two tasks are normally considered as parallel tasks but may have cross-impact on each other.</br>
這兩個(gè)任務(wù)通常被認(rèn)為是并行任務(wù)贱枣,但可能相互影響监署。

The intent detection is treated as an utterance classification problem, which can be modeled using conventional classifiers including regression, support vector machines (SVMs) or even deep neural networks (Haffner et al., 2003; Sarikaya et al., 2011).</br>
意圖檢測被視為一個(gè)話語分類問題,可以用傳統(tǒng)的分類器來建模纽哥,包括回歸钠乏、支持向量機(jī)(SVMs)甚至是深度神經(jīng)網(wǎng)絡(luò)(Haffner等人,2003;Sarikaya等人春塌,2011年)晓避。

The slot filling task can be formulated as a sequence labeling problem, and the most popular approaches with good performances are using conditional random fields (CRFs) and recurrent neural networks (RNN) as recent works (Xu and Sarikaya, 2013).</br>
槽填充任務(wù)可以作為一個(gè)序列標(biāo)簽問題來制定,而最流行的具有良好性能的方法是使用有條件的隨機(jī)字段(CRFs)和重復(fù)的神經(jīng)網(wǎng)絡(luò)(RNN)作為最近的工作(徐和Sarikaya只壳,2013)俏拱。

Some works also suggested using one joint RNN model for generating results of the two tasks together, by taking advantage of the sequence to sequence(Sutskever et al., 2014) (or encoderdecoder) model, which also gives decent results as in literature(Liu and Lane, 2016a).</br>
一些作品還建議使用一個(gè)聯(lián)合RNN模型來生成這兩個(gè)任務(wù)的結(jié)果,通過利用序列序列(Sutskever等人吼句,2014)(或編碼解碼器)模型锅必,這也提供了與文獻(xiàn)相關(guān)的良好結(jié)果(Liu和Lane,2016a)惕艳。

In this paper, Bi-model based RNN structures are proposed to take the cross-impact between two tasks into account, hence can further improve the performance of modeling an SLU system.</br>
本文提出了基于雙模型的RNN結(jié)構(gòu)搞隐,將兩個(gè)任務(wù)之間的交叉影響考慮在內(nèi)驹愚,從而進(jìn)一步提高了SLU系統(tǒng)建模的性能。

These models can generate the intent and semantic tags concurrently for each utterance.</br>
這些模型可以為每個(gè)話語同時(shí)生成意圖和語義標(biāo)記劣纲。

In our Bi-model structures, two task-networks are built for the purpose of intent detection and slot filling.
在我們的雙模型結(jié)構(gòu)中逢捺,兩個(gè)任務(wù)網(wǎng)絡(luò)是為了目的檢測和插槽填充而構(gòu)建的。

Each task-network includes one BLSTM with or without a LSTM decoder (Hochreiter and Schmidhuber, 1997; Graves and Schmidhuber, 2005).</br>
每個(gè)任務(wù)網(wǎng)絡(luò)包括一個(gè)有或沒有LSTM解碼器的BLSTM(Hochreiter和施密特癞季,1997;格雷夫斯和施米德休伯蒸甜,2005年)。

The paper is organized as following: In section 2, a brief overview of existing deep learning approaches for intent detection and slot fillings are given.</br>
本文的組織如下:在第2部分中余佛,簡要概述了現(xiàn)有的用于意圖檢測和槽填充的深度學(xué)習(xí)方法柠新。

The new proposed Bi-model based RNN approach will be illustrated in detail in section 3.
新提出的基于RNN方法的新模型將在第3部分中詳細(xì)說明。

In section 4, two experiments on different datasets will be given.</br>
在第4部分中辉巡,將給出兩個(gè)不同數(shù)據(jù)集的實(shí)驗(yàn)恨憎。

One is performed on the ATIS benchmark dataset, in order to demonstrate a state-of-the-art result for both semantic parsing tasks.</br>
其中一個(gè)是在ATIS基準(zhǔn)數(shù)據(jù)集上執(zhí)行的,目的是為了演示兩個(gè)語義解析任務(wù)的最先進(jìn)的結(jié)果郊楣。

The other experiment is tested on our internal multi-domain dataset by comparing our new algorithm with the current best performed RNN based joint model in literature for intent detection and slot filling.</br>
另一個(gè)實(shí)驗(yàn)是在我們的內(nèi)部多域數(shù)據(jù)集上進(jìn)行的憔恳,通過比較我們的新算法和當(dāng)前在文獻(xiàn)中最有效的基于RNN的聯(lián)合模型來進(jìn)行意圖檢測和槽填充。

2 Background 背景

In this section, a brief background overview on using deep learning and RNN based approaches to perform intent detection and slot filling tasks is given.</br>
在本節(jié)中净蚤,給出了使用深度學(xué)習(xí)和基于RNN的方法來執(zhí)行意圖檢測和槽填充任務(wù)的簡要背景概述钥组。

The joint model algorithm is also discussed for further comparison purpose.
為了進(jìn)一步的比較目的,還討論了聯(lián)合模型算法今瀑。

2.1 Deep neural network for intent detection 意圖識別深度神經(jīng)網(wǎng)絡(luò)

Using deep neural networks for intent detection is similar to a standard classification problem, the only difference is that this classifier is trained under a specific domain.
使用深層神經(jīng)網(wǎng)絡(luò)進(jìn)行意圖檢測類似于標(biāo)準(zhǔn)的分類問題程梦,唯一的區(qū)別是這個(gè)分類器是在一個(gè)特定的領(lǐng)域中訓(xùn)練的。

For example, all data in ATIS dataset is under the flight reservation domain with 18 different intent labels.
例如橘荠,ATIS數(shù)據(jù)集中的所有數(shù)據(jù)都在航班預(yù)訂域中屿附,有18個(gè)不同的意圖標(biāo)簽。

There are mainly two types of models that can be used: one is a feed-forward model by taking the average of all words’ vectors in an utterance as its input, the other way is by using the recurrent neural network which can take each word in an utterance as a vector one by one (Xu and Sarikaya, 2014).
主要有兩種類型的模型,可以使用:一個(gè)是前饋模型通過所有單詞的平均向量的話語作為其輸入,另一種方法是通過使用遞歸神經(jīng)網(wǎng)絡(luò),可以把每個(gè)單詞在一個(gè)話語作為一個(gè)向量一個(gè)接一個(gè)(徐Sarikaya,2014)哥童。

2.2 Recurrent Neural network for slot filling 循環(huán)神經(jīng)網(wǎng)絡(luò)的槽填充

The slot filling task is a bit different from intent detection as there are multiple outputs for the task, hence only RNN model is a feasible approach for this scenario.</br>
插槽填充任務(wù)與意圖檢測稍有不同挺份,因?yàn)槿蝿?wù)有多個(gè)輸出,因此只有RNN模型是這種場景的可行方法贮懈。

The most straight-forward way is using single RNN model generating multiple semanctic tags sequentially by reading in each word one by one (Liu and Lane, 2015; Mesnil et al., 2015; Peng and Yao, 2015).</br>
最直接的方法是使用單個(gè)RNN模型匀泊,通過逐項(xiàng)逐項(xiàng)讀取每個(gè)單詞(Liu和Lane,2015;Mesnil等人朵你,2015年;彭和姚各聘,2015年)。

This approach has a constrain that the number of slot tags generated should be the same as that of words in an utterance.</br>
這種方法有一個(gè)限制撬呢,即生成的槽標(biāo)記的數(shù)量應(yīng)該與話語中的單詞相同伦吠。

Another way to overcome this limitation is by using an encoder-decoder model containing two RNN models as an encoder for input and a decoder for output (Liu and Lane, 2016a).</br>
另一種克服這種限制的方法是使用一個(gè)編碼-解碼器模型,其中包含兩個(gè)RNN模型作為輸入的編碼器和輸出的解碼器(Liu和Lane,2016a)毛仪。

The advantage of doing this is that it gives the system capability of matching an input utterance and output slot tags with different lengths without the need of alignment. Besides using RNN, It is also possible to use the convolutional neural network (CNN) together with a conditional random field (CRF) to achieve slot filling task (Xu and Sarikaya, 2013).</br>
這樣做的好處是搁嗓,它提供了匹配輸入的聲音和輸出槽標(biāo)記的系統(tǒng)功能,而不需要對齊箱靴。除了使用RNN之外腺逛,還可以使用卷積神經(jīng)網(wǎng)絡(luò)(CNN)和一個(gè)條件隨機(jī)場(CRF)來實(shí)現(xiàn)插槽填充任務(wù)(Xu和Sarikaya,2013)衡怀。

2.3 Joint model for two tasks 兩個(gè)任務(wù)的聯(lián)合模型
It is also possible to use one joint model for intent detection and slot filling (Guo et al., 2014; Liu and Lane, 2016a,b; Zhang and Wang, 2016; Hakkani-T¨ur et al., 2016). One way is by using one encoder with two decoders, the first decoder will generate sequential semantic tags and the second decoder generates the intent.
也可以使用一個(gè)聯(lián)合模型來進(jìn)行意圖檢測和插槽填充(郭等棍矛,2014;劉和萊恩,2016a抛杨,b;張和王够委,2016;Hakkani-T ur等人,2016年)怖现。一種方法是使用一個(gè)編碼器和兩個(gè)解碼器茁帽,第一個(gè)解碼器將生成順序語義標(biāo)簽,第二個(gè)解碼器會產(chǎn)生意圖屈嗤。

Another approach is by consolidating the hidden states information from an RNN slot filling model, then generates its intent using an attention model (Liu and Lane, 2016a).
另一種方法是將隱藏狀態(tài)信息從RNN槽填充模型中合并潘拨,然后使用注意力模型(Liu和Lane,2016a)生成其意圖饶号。

Both of the two approaches demonstrates very good results on ATIS dataset.</br>
這兩種方法都在ATIS數(shù)據(jù)集上顯示了非常好的結(jié)果铁追。

3 Bi-model RNN structures for joint semantic frame parsing 用于聯(lián)合語義框架解析的雙向RNN結(jié)構(gòu)

Despite the success of RNN based sequence to sequence (or encoder-decoder) model on both tasks, most of the approaches in literature still use one single RNN model for each task or both tasks.
盡管在這兩個(gè)任務(wù)上,基于RNN的序列(或編碼-解碼器)模型都取得了成功茫船,但大多數(shù)文獻(xiàn)中的方法仍然使用一個(gè)單一的RNN模型來完成每個(gè)任務(wù)或兩個(gè)任務(wù)琅束。

They treat the intent detection and slot filling as two separate tasks.</br>
他們將意圖檢測和插槽填充視為兩個(gè)獨(dú)立的任務(wù)。

In this section, two new Bi-model structures are proposed to take their cross-impact into account, hence further improve their performance.
在本節(jié)中透硝,提出了兩個(gè)新的雙模型結(jié)構(gòu)狰闪,以將它們的交叉影響考慮在內(nèi),從而進(jìn)一步提高它們的性能濒生。

One structure takes the advantage of a decoder structure and the other doesn’t.
一種結(jié)構(gòu)利用了解碼器結(jié)構(gòu)的優(yōu)勢,而另一種結(jié)構(gòu)則沒有幔欧。

An asynchronous training approach based on two models’ cost functions is designed to adapt to these new structures.
基于兩個(gè)模型的成本函數(shù)的異步訓(xùn)練方法是為了適應(yīng)這些新結(jié)構(gòu)而設(shè)計(jì)的罪治。

3.1 Bi-model RNN Structures 雙向RNN模型架構(gòu)

A graphical illustration of two Bi-model structures with and without a decoder is shown in Figure 1.
圖1顯示了兩個(gè)具有和沒有解碼器的雙模型結(jié)構(gòu)的圖形說明。


image.png

The two structures are quite similar to each other except that Figure 1a contains a LSTM based decoder, hence there is an extra decoder state st to be cascaded besides the encoder state ht.
這兩個(gè)結(jié)構(gòu)非常相似礁蔗,只是圖1a包含一個(gè)基于LSTM的解碼器觉义,因此除了編碼器狀態(tài)h_t之外,還有一個(gè)額外的解碼器狀態(tài)s_t浴井。

Remarks: 評論
The concept of using information from multiplemodel/ multi-modal to achieve better performance has been widely used in deep learning (Dean et al., 2012; Wang, 2017; Ngiam et al., 2011; Srivastava and Salakhutdinov, 2012), system identification (Murray-Smith and Johansen, 1997; Narendra et al., 2014, 2015) and also reinforcement learning field recently (Narendra et al., 2016; Wang and Jin, 2018).
在深度學(xué)習(xí)中廣泛使用了利用多模式/多模式信息來獲得更好的性能的概念(Dean et al.晒骇,2012;王,2017;Ngiam等人,2011年;2012年斯利瓦斯塔瓦和薩拉赫哈丁諾夫,系統(tǒng)識別(Murray-Smith和約翰森洪囤,1997;納倫德拉等人徒坡,2014年,2015年)瘤缩,以及最近的強(qiáng)化學(xué)習(xí)領(lǐng)域(納倫德拉等人喇完,2016年;王和金,2018年)剥啤。

Instead of using collective information, in this paper, our work introduces a totally new approach of training multiple neural networks asynchronously by sharing their internal state information.
我們的工作不是使用集體信息锦溪,而是引入了一種全新的方法,通過共享內(nèi)部狀態(tài)信息來異步地訓(xùn)練多個(gè)神經(jīng)網(wǎng)絡(luò)府怯。

3.1.1 Bi-model structure with a decoder 基于解碼結(jié)構(gòu)的雙模型

The Bi-model structure with a decoder is shown as in Figure 1a.
帶有解碼器的雙模型結(jié)構(gòu)如圖1a所示刻诊。

There are two inter-connected bidirectional LSTMs (BLSTMs) in the structure, one is for intent detection and the other is for slot filling.
在結(jié)構(gòu)中有兩個(gè)相互連接的雙向LSTMs(BLSTMs),一個(gè)用于意圖檢測牺丙,另一個(gè)用于槽填充则涯。

Each BLSTM reads in the input utterance sequences(x_1, x_2, · · ·, x_n) forward and backward, and generates two sequences of hidden states hf_t and hb_t
每一個(gè)BLSTM都在輸入的話語序列中(x_1, x_2, · · ·, x_n)向前和向后讀取,并產(chǎn)生兩個(gè)隱藏狀態(tài) hf_thb_t的序列赘被。

A concatenation of hf_t and hb_t forms a final BLSTM state ht = [hf_t,hb_t] at time step t.
在時(shí)間t的時(shí)候是整,hf_thb_t的串聯(lián)形成了一個(gè)最終的BLSTM狀態(tài)ht = [hf_t,hb_t]。

Hence, Our bidirectional LSTM fi(·) generates a sequence of hidden states (h^i_1, h^i_2,· · ·, h^i_n), where i = 1 corresponds the network for intent detection task and i = 2 is for the slot filling task.
因此民假,我們的雙向LSTM fi(·)產(chǎn)生一系列隱藏狀態(tài)(h^i_1, h^i_2,· · ·, h^i_n)浮入,其中i=1對應(yīng)于意圖檢測任務(wù)的網(wǎng)絡(luò),而i=2則用于槽填充任務(wù)羊异。

In order to detect intent, hidden state h^1_t is combined together with h^2_t from the other bidirectional LSTM f2(·) in slot filling task-network to generate the state of g1(·), s^1_t, at time step t:
s^1_t = \phi(s^1_{t-1}, h^1_{n-1}, h^2_{n-1})

y^1_{intent} = \arg\max_{y^1_i}P(\hat{y}^1_n|s^1_{n-1}, h^1_{n-1}, h^2_{n-1}) \qquad \qquad (1)
為了檢測意圖事秀,隱藏狀態(tài)h^1_t與其他雙向LSTM f2()在槽填充任務(wù)網(wǎng)絡(luò)中的h^2_t結(jié)合在一起,以生成g1()野舶、s^1_t的狀態(tài)易迹,在時(shí)間t:
s^1_t = \phi(s^1_{t-1}, h^1_{n-1}, h^2_{n-1})

y^1_{intent} = \arg\max_{y^1_i}P(\hat{y}^1_n|s^1_{n-1}, h^1_{n-1}, h^2_{n-1}) \qquad\qquad (1)

where \hat{y}^1_n contains the predicted probabilities for all intent labels at the last time step n.</br>
\hat{y}^1_n包含了最后一次步驟n中所有意圖標(biāo)簽的預(yù)測概率。

For the slot filling task, a similar network structure is constructed with a BLSTM f2(·) and a LSTM g2(·). f2(·) is the same as f1(·), by reading in the a word sequence as its input.
對于槽填充任務(wù),類似的網(wǎng)絡(luò)結(jié)構(gòu)是用BLSTM f2(·)和lstmg2(·)構(gòu)建的平道。f2(·)與f1(·)相同,通過在一個(gè)單詞序列中閱讀作為輸入睹欲。

The difference is that there will be an output y^2_t at each time step t for g2(·), as it is a sequence labeling problem.
不同之處在于,對于g2()的每一次一屋,都會有一個(gè)輸出y^2_t窘疮,因?yàn)樗且粋€(gè)序列標(biāo)簽問題。

At each step t:</br>
s^2_t=\psi(h^2_{t-1}, h^1_{t-1}, s^2_{t-1}, y^2_{t-1})

y^2_t=\arg\max_{\hat{y}^2_t}P(\hat{y}^2_t|h^1_{t-1},h^2_{t-1},s^2_{t-1},y^2_{t-1}) \qquad\qquad (2)
在每一步t:
s^2_t=\psi(h^2_{t-1}, h^1_{t-1}, s^2_{t-1}, y^2_{t-1})

y^2_t=\arg\max_{\hat{y}^2_t}P(\hat{y}^2_t|h^1_{t-1},h^2_{t-1},s^2_{t-1},y^2_{t-1}) \qquad\qquad (2)
where y^2_t is the predicted semantic tags at time step t.
y^2_t是時(shí)間t的預(yù)測語義標(biāo)記冀墨。

3.1.2 Bi-Model structure without a decoder 無解碼器的雙模結(jié)構(gòu)

The Bi-model structure without a decoder is shown as in Figure 1b.</br>
沒有解碼器的雙模型結(jié)構(gòu)如圖1b所示闸衫。

In this model, there is no LSTM decoder as in the previous model.</br>
在這個(gè)模型中,沒有像前一個(gè)模型那樣的LSTM解碼器诽嘉。

For the intent task, only one predicted output label y1 intent is generated from BLSTM f1(·) at the last time step n, where n is the length of the utterance.</br>
對于意圖任務(wù)蔚出,只有一個(gè)預(yù)測的輸出標(biāo)簽y1意圖是從BLSTM f1()在最后一次步驟n中生成的弟翘,其中n是話語的長度。

Similarly, the state value h^1_t and output intent label are generated as:</br>
類似地骄酗,國家價(jià)值h^1_t和輸出意圖標(biāo)簽生成如下:

h^1_t=\phi(h^1_{t-1},h^2_{t-1})
y^1_{intent}=\arg\max_{\hat{y}^1_n}P(\hat{y}^1_n|h^1_{n-1},h^2_{n-1}) \qquad \qquad (3)

For the slot filling task, the basic structure of BLSTM f2(·) is similar to that for the intent detection task f1(·), except that there is one slot tag label y^2_t generated at each time step t.
對于槽填充任務(wù)稀余,BLSTM f2()的基本結(jié)構(gòu)與意圖檢測任務(wù)f1()相似,只是在每個(gè)時(shí)間步驟t中生成一個(gè)槽標(biāo)簽標(biāo)簽y^2_t酥筝。

It also takes the hidden state from two BLSTMs f1(·) and f2(·), i.e. h^1_{t-1} and h^2_{t-1} , plus the output tag y^2_{t-1} together to generate its next state value h^2_t and also the slot tag y^2_t . To represent this as a function mathematically:
它還從兩個(gè)BLSTMs f1()和f2()中獲取隱藏狀態(tài)滚躯,即h^1_{t-1}h^2_{t-1},加上輸出標(biāo)簽y^2_{t-1}一起產(chǎn)生下一個(gè)狀態(tài)值$h^2_t$和插槽標(biāo)記y^2_t嘿歌。用數(shù)學(xué)方法來表示這個(gè)函數(shù):
h^2_t= \phi(h^1_{t-1},h^2_{t-1},y^2_{t-1})
y^2_t=\arg\max_{\hat{y}^2_t}P(\hat{y}^2_t|h^1_{t-1},h^2_{t-1},y^2_{t-1}) \qquad\qquad (2)
3.1.3 Asynchronous training 異步訓(xùn)練
One of the major differences in the Bi-model structure is its asynchronous training, which trains two task-networks based on their own cost functions in an asynchronous manner.
雙模型結(jié)構(gòu)的主要區(qū)別之一是它的異步訓(xùn)練掸掏,它以異步方式訓(xùn)練兩個(gè)任務(wù)網(wǎng)絡(luò),以它們自己的成本函數(shù)為基礎(chǔ)宙帝。

The loss function for intent detection task-network is \mathcal{L1} , and for slot filling is \mathcal{L2}. \mathcal{L1} and \mathcal{L2} are defined using cross entropy as:
意圖檢測任務(wù)網(wǎng)絡(luò)的損失函數(shù)是\mathcal{L1}丧凤,而槽填充是\mathcal{L2}\mathcal{L1}\mathcal{L2}是用交叉熵定義的:

\mathcal{L1}\triangleq -\sum_{i-1}^k \hat{y}^{1,i}_{intent} \log(y^{1,i}_{intent}) \qquad (5)
and
\mathcal{L2}\triangleq -\sum_{j-1}^n \sum^m_{i-1} \hat{y}^{2,i}_{j} \log(y^{2,i}_{j}) \qquad (6)

where k is the number of intent label types, m is the number of semantic tag types and n is the number of words in a word sequence.
當(dāng)k是意圖標(biāo)簽類型的數(shù)量時(shí)步脓,m是語義標(biāo)記類型的數(shù)量愿待,而n是單詞序列中的單詞數(shù)量。

In each training iteration, both intent detection and slot filling networks will generate a groups of hidden states h^1 and h^2 from the models in previous iteration.
在每次訓(xùn)練迭代中靴患,意圖檢測和插槽填充網(wǎng)絡(luò)將在以前的迭代中產(chǎn)生一組隱藏狀態(tài)h^1h^2仍侥。

The intent detection task-network reads in a batch of input data x_i and hidden states h^2, and generates the estimated intent labels \hat{y}^1_{intent}.
意圖檢測任務(wù)網(wǎng)絡(luò)在一批輸入數(shù)據(jù)x_i和隱藏狀態(tài)h^2中讀取,并生成估計(jì)的意圖標(biāo)簽\hat{y}^1_{intent}鸳君。

The intent detection task-network computes its cost based on function \mathcal{L1} and trained on that.
意圖檢測任務(wù)網(wǎng)絡(luò)根據(jù)功能\mathcal{L1}計(jì)算其成本农渊,并對其進(jìn)行培訓(xùn)。

Then the same batch of data x_i will be fed into the slot filling task network together with the hidden state h^1 from intent task-network, and further generates a batch of outputs y^2_i for each time step.
然后或颊,同樣的數(shù)據(jù)x_i將被輸入到槽填充任務(wù)網(wǎng)絡(luò)中砸紊,以及來自意圖任務(wù)網(wǎng)絡(luò)的隱藏狀態(tài)h^1,并在每一個(gè)時(shí)間步驟中進(jìn)一步生成一批輸出y^2_i囱挑。

Its cost value is then computed based on cost function \mathcal{L2}, and further trained on that.
然后根據(jù)成本函數(shù)\mathcal{L2}來計(jì)算它的成本值醉顽,并對其進(jìn)行進(jìn)一步的培訓(xùn)。

The reason of using asynchronous training approach is because of the importance of keeping two separate cost functions for different tasks. Doing this has two main advantages:
使用異步培訓(xùn)方法的原因是平挑,對于不同的任務(wù)游添,保持兩個(gè)獨(dú)立的成本函數(shù)是很重要的。這樣做有兩個(gè)主要優(yōu)點(diǎn):

  1. It filters the negative impact between two tasks in comparison to using only one joint model, by capturing more useful information and overcoming the structural limitation of one model.

  2. 它通過捕獲更有用的信息和克服一個(gè)模型的結(jié)構(gòu)限制通熄,來過濾兩個(gè)任務(wù)之間的負(fù)面影響否淤,而不是只使用一個(gè)聯(lián)合模型。

  3. The cross-impact between two tasks can only be learned by sharing hidden states of two models, which are trained using two cost functions separately.

  4. 兩個(gè)任務(wù)之間的交叉影響只能通過共享兩個(gè)模型的隱藏狀態(tài)來學(xué)習(xí)棠隐,這兩個(gè)模型分別使用兩個(gè)成本函數(shù)進(jìn)行訓(xùn)練。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末檐嚣,一起剝皮案震驚了整個(gè)濱河市助泽,隨后出現(xiàn)的幾起案子啰扛,更是在濱河造成了極大的恐慌,老刑警劉巖嗡贺,帶你破解...
    沈念sama閱讀 211,639評論 6 492
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件隐解,死亡現(xiàn)場離奇詭異,居然都是意外死亡诫睬,警方通過查閱死者的電腦和手機(jī)煞茫,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 90,277評論 3 385
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來摄凡,“玉大人续徽,你說我怎么就攤上這事∏自瑁” “怎么了钦扭?”我有些...
    開封第一講書人閱讀 157,221評論 0 348
  • 文/不壞的土叔 我叫張陵,是天一觀的道長床绪。 經(jīng)常有香客問我客情,道長,這世上最難降的妖魔是什么癞己? 我笑而不...
    開封第一講書人閱讀 56,474評論 1 283
  • 正文 為了忘掉前任膀斋,我火速辦了婚禮,結(jié)果婚禮上痹雅,老公的妹妹穿的比我還像新娘砚著。我一直安慰自己,他們只是感情好留搔,可當(dāng)我...
    茶點(diǎn)故事閱讀 65,570評論 6 386
  • 文/花漫 我一把揭開白布译红。 她就那樣靜靜地躺著,像睡著了一般铃将。 火紅的嫁衣襯著肌膚如雪项鬼。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 49,816評論 1 290
  • 那天劲阎,我揣著相機(jī)與錄音绘盟,去河邊找鬼。 笑死悯仙,一個(gè)胖子當(dāng)著我的面吹牛龄毡,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播锡垄,決...
    沈念sama閱讀 38,957評論 3 408
  • 文/蒼蘭香墨 我猛地睜開眼沦零,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了货岭?” 一聲冷哼從身側(cè)響起路操,我...
    開封第一講書人閱讀 37,718評論 0 266
  • 序言:老撾萬榮一對情侶失蹤疾渴,失蹤者是張志新(化名)和其女友劉穎,沒想到半個(gè)月后屯仗,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體搞坝,經(jīng)...
    沈念sama閱讀 44,176評論 1 303
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 36,511評論 2 327
  • 正文 我和宋清朗相戀三年魁袜,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了桩撮。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 38,646評論 1 340
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡峰弹,死狀恐怖店量,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情垮卓,我是刑警寧澤垫桂,帶...
    沈念sama閱讀 34,322評論 4 330
  • 正文 年R本政府宣布,位于F島的核電站粟按,受9級特大地震影響诬滩,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜灭将,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,934評論 3 313
  • 文/蒙蒙 一疼鸟、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧庙曙,春花似錦空镜、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,755評論 0 21
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至砂蔽,卻和暖如春洼怔,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背左驾。 一陣腳步聲響...
    開封第一講書人閱讀 31,987評論 1 266
  • 我被黑心中介騙來泰國打工镣隶, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人诡右。 一個(gè)月前我還...
    沈念sama閱讀 46,358評論 2 360
  • 正文 我出身青樓安岂,卻偏偏與公主長得像,于是被迫代替她去往敵國和親帆吻。 傳聞我的和親對象是個(gè)殘疾皇子域那,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 43,514評論 2 348

推薦閱讀更多精彩內(nèi)容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi閱讀 7,309評論 0 10
  • 筆記 老師范例 作業(yè)
    獨(dú)尊孤鷹閱讀 585評論 0 0
  • 今天第一次陪練,是個(gè)小朋友猜煮,活潑可愛琉雳,上第一節(jié)課样眠。還找不到弦,剛開始還覺得新鮮后面有點(diǎn)力不從心翠肘,覺得老是沒教會,小...
    7b7274288a72閱讀 450評論 0 1
  • 以前經(jīng)常問自己幸福是什么辫秧?從書本上從他人的口中都得到過不同的答案束倍。直到今天,我抬頭盟戏,看到了櫥柜上自己小時(shí)候的照片绪妹,...
    寧初八閱讀 146評論 0 0