In this paper, new Bi-model based RNN semantic frame parsing network structures are designed to perform the intent detection and slot filling tasks jointly, by considering their cross-impact to each other using two correlated bidirectional LSTMs (BLSTM).
本文利用兩種關(guān)聯(lián)的雙向LSTMs(BLSTM)钞钙,設(shè)計(jì)了一種基于RNN語義框架解析網(wǎng)絡(luò)結(jié)構(gòu)的新型雙模框架解析網(wǎng)絡(luò)結(jié)構(gòu),從而實(shí)現(xiàn)了意圖檢測和槽填充任務(wù)笛谦。
Abstract 摘要
Intent detection and slot filling are two main tasks for building a spoken language understanding (SLU) system.
意圖檢測和插槽填充是構(gòu)建口語理解(SLU)系統(tǒng)的兩個(gè)主要任務(wù)对蒲。
Multiple deep learning based models have demonstrated good results on these tasks .
多個(gè)基于深度學(xué)習(xí)的模型在這些任務(wù)上已經(jīng)顯示出了良好的結(jié)果。
The most effective algorithms are based on the structures of sequence to sequence models (or ”encoder-decoder” models), and generate the intents and semantic tags either using separate models((Yao et al., 2014; Mesnil et al., 2015; Peng and Yao, 2015; Kurata et al., 2016; Hahn et al., 2011)) or a joint model ((Liu and Lane, 2016a; Hakkani-T¨ur et al., 2016; Guo et al., 2014)).
最有效的算法是基于序列模型(或“編碼-解碼”模型)的結(jié)構(gòu)还最,并使用單獨(dú)的模型生成意圖和語義標(biāo)記(姚等人芭逝,2014;Mesnil等人,2015年;彭和姚舆驶,2015;庫拉塔等人橱健,2016年;哈恩等人,2011年))或聯(lián)合模型(劉和萊恩沙廉,2016a;Hakkani-T ur等人拘荡,2016年;郭等人,2014年))撬陵。
Most of the previous studies, however, either treat the intent detection and slot filling as two separate parallel tasks, or use a sequence to sequence model to generate both semantic tags and intent.
然而珊皿,以前的大多數(shù)研究都將意圖檢測和槽填充作為兩個(gè)獨(dú)立的并行任務(wù)處理,或者使用序列來生成語義標(biāo)簽和意圖巨税。
Most of these approaches use one (joint) NN based model (including encoderdecoder structure) to model two tasks, hence may not fully take advantage of the cross impact between them.
這些方法中的大多數(shù)使用一個(gè)(聯(lián)合)基于NN的模型(包括編碼解碼器結(jié)構(gòu))來對兩個(gè)任務(wù)進(jìn)行建模蟋定,因此可能無法充分利用它們之間的交叉影響。
In this paper, new Bi-model based RNN semantic frame parsing network structures are designed to perform the intent detection and slot filling tasks jointly, by considering their cross-impact to each other using two correlated bidirectional LSTMs (BLSTM).</br>
本文利用兩種關(guān)聯(lián)的雙向LSTMs(BLSTM)草添,設(shè)計(jì)了一種基于RNN語義框架解析網(wǎng)絡(luò)結(jié)構(gòu)的新型雙氖欢担框架解析網(wǎng)絡(luò)結(jié)構(gòu),從而實(shí)現(xiàn)了意圖檢測和槽填充任務(wù)远寸。
Our Bi-model structure with a decoder achieves state-of-the-art result on the benchmark ATIS data (Hemphill et al., 1990; Tur et al., 2010), with about 0.5% intent accuracy improvement and 0.9 % slot filling improvement.
我們的雙模型結(jié)構(gòu)和一個(gè)解碼器實(shí)現(xiàn)了對基準(zhǔn)的ATIS數(shù)據(jù)的最先進(jìn)的結(jié)果(Hemphill等抄淑,1990;圖爾等人,2010年)而晒,大約0.5%的意圖精度提高和0.9%的插槽填充改進(jìn)蝇狼。
1 Introduction 介紹
The research on spoken language understanding (SLU) system has progressed extremely fast during the past decades.
在過去的幾十年里,對口語理解(SLU)系統(tǒng)的研究進(jìn)展非常迅速倡怎。
Two important tasks in an SLU system are intent detection and slot filling.
SLU系統(tǒng)中的兩個(gè)重要任務(wù)是意圖檢測和插槽填充。
These two tasks are normally considered as parallel tasks but may have cross-impact on each other.</br>
這兩個(gè)任務(wù)通常被認(rèn)為是并行任務(wù)贱枣,但可能相互影響监署。
The intent detection is treated as an utterance classification problem, which can be modeled using conventional classifiers including regression, support vector machines (SVMs) or even deep neural networks (Haffner et al., 2003; Sarikaya et al., 2011).</br>
意圖檢測被視為一個(gè)話語分類問題,可以用傳統(tǒng)的分類器來建模纽哥,包括回歸钠乏、支持向量機(jī)(SVMs)甚至是深度神經(jīng)網(wǎng)絡(luò)(Haffner等人,2003;Sarikaya等人春塌,2011年)晓避。
The slot filling task can be formulated as a sequence labeling problem, and the most popular approaches with good performances are using conditional random fields (CRFs) and recurrent neural networks (RNN) as recent works (Xu and Sarikaya, 2013).</br>
槽填充任務(wù)可以作為一個(gè)序列標(biāo)簽問題來制定,而最流行的具有良好性能的方法是使用有條件的隨機(jī)字段(CRFs)和重復(fù)的神經(jīng)網(wǎng)絡(luò)(RNN)作為最近的工作(徐和Sarikaya只壳,2013)俏拱。
Some works also suggested using one joint RNN model for generating results of the two tasks together, by taking advantage of the sequence to sequence(Sutskever et al., 2014) (or encoderdecoder) model, which also gives decent results as in literature(Liu and Lane, 2016a).</br>
一些作品還建議使用一個(gè)聯(lián)合RNN模型來生成這兩個(gè)任務(wù)的結(jié)果,通過利用序列序列(Sutskever等人吼句,2014)(或編碼解碼器)模型锅必,這也提供了與文獻(xiàn)相關(guān)的良好結(jié)果(Liu和Lane,2016a)惕艳。
In this paper, Bi-model based RNN structures are proposed to take the cross-impact between two tasks into account, hence can further improve the performance of modeling an SLU system.</br>
本文提出了基于雙模型的RNN結(jié)構(gòu)搞隐,將兩個(gè)任務(wù)之間的交叉影響考慮在內(nèi)驹愚,從而進(jìn)一步提高了SLU系統(tǒng)建模的性能。
These models can generate the intent and semantic tags concurrently for each utterance.</br>
這些模型可以為每個(gè)話語同時(shí)生成意圖和語義標(biāo)記劣纲。
In our Bi-model structures, two task-networks are built for the purpose of intent detection and slot filling.
在我們的雙模型結(jié)構(gòu)中逢捺,兩個(gè)任務(wù)網(wǎng)絡(luò)是為了目的檢測和插槽填充而構(gòu)建的。
Each task-network includes one BLSTM with or without a LSTM decoder (Hochreiter and Schmidhuber, 1997; Graves and Schmidhuber, 2005).</br>
每個(gè)任務(wù)網(wǎng)絡(luò)包括一個(gè)有或沒有LSTM解碼器的BLSTM(Hochreiter和施密特癞季,1997;格雷夫斯和施米德休伯蒸甜,2005年)。
The paper is organized as following: In section 2, a brief overview of existing deep learning approaches for intent detection and slot fillings are given.</br>
本文的組織如下:在第2部分中余佛,簡要概述了現(xiàn)有的用于意圖檢測和槽填充的深度學(xué)習(xí)方法柠新。
The new proposed Bi-model based RNN approach will be illustrated in detail in section 3.
新提出的基于RNN方法的新模型將在第3部分中詳細(xì)說明。
In section 4, two experiments on different datasets will be given.</br>
在第4部分中辉巡,將給出兩個(gè)不同數(shù)據(jù)集的實(shí)驗(yàn)恨憎。
One is performed on the ATIS benchmark dataset, in order to demonstrate a state-of-the-art result for both semantic parsing tasks.</br>
其中一個(gè)是在ATIS基準(zhǔn)數(shù)據(jù)集上執(zhí)行的,目的是為了演示兩個(gè)語義解析任務(wù)的最先進(jìn)的結(jié)果郊楣。
The other experiment is tested on our internal multi-domain dataset by comparing our new algorithm with the current best performed RNN based joint model in literature for intent detection and slot filling.</br>
另一個(gè)實(shí)驗(yàn)是在我們的內(nèi)部多域數(shù)據(jù)集上進(jìn)行的憔恳,通過比較我們的新算法和當(dāng)前在文獻(xiàn)中最有效的基于RNN的聯(lián)合模型來進(jìn)行意圖檢測和槽填充。
2 Background 背景
In this section, a brief background overview on using deep learning and RNN based approaches to perform intent detection and slot filling tasks is given.</br>
在本節(jié)中净蚤,給出了使用深度學(xué)習(xí)和基于RNN的方法來執(zhí)行意圖檢測和槽填充任務(wù)的簡要背景概述钥组。
The joint model algorithm is also discussed for further comparison purpose.
為了進(jìn)一步的比較目的,還討論了聯(lián)合模型算法今瀑。
2.1 Deep neural network for intent detection 意圖識別深度神經(jīng)網(wǎng)絡(luò)
Using deep neural networks for intent detection is similar to a standard classification problem, the only difference is that this classifier is trained under a specific domain.
使用深層神經(jīng)網(wǎng)絡(luò)進(jìn)行意圖檢測類似于標(biāo)準(zhǔn)的分類問題程梦,唯一的區(qū)別是這個(gè)分類器是在一個(gè)特定的領(lǐng)域中訓(xùn)練的。
For example, all data in ATIS dataset is under the flight reservation domain with 18 different intent labels.
例如橘荠,ATIS數(shù)據(jù)集中的所有數(shù)據(jù)都在航班預(yù)訂域中屿附,有18個(gè)不同的意圖標(biāo)簽。
There are mainly two types of models that can be used: one is a feed-forward model by taking the average of all words’ vectors in an utterance as its input, the other way is by using the recurrent neural network which can take each word in an utterance as a vector one by one (Xu and Sarikaya, 2014).
主要有兩種類型的模型,可以使用:一個(gè)是前饋模型通過所有單詞的平均向量的話語作為其輸入,另一種方法是通過使用遞歸神經(jīng)網(wǎng)絡(luò),可以把每個(gè)單詞在一個(gè)話語作為一個(gè)向量一個(gè)接一個(gè)(徐Sarikaya,2014)哥童。
2.2 Recurrent Neural network for slot filling 循環(huán)神經(jīng)網(wǎng)絡(luò)的槽填充
The slot filling task is a bit different from intent detection as there are multiple outputs for the task, hence only RNN model is a feasible approach for this scenario.</br>
插槽填充任務(wù)與意圖檢測稍有不同挺份,因?yàn)槿蝿?wù)有多個(gè)輸出,因此只有RNN模型是這種場景的可行方法贮懈。
The most straight-forward way is using single RNN model generating multiple semanctic tags sequentially by reading in each word one by one (Liu and Lane, 2015; Mesnil et al., 2015; Peng and Yao, 2015).</br>
最直接的方法是使用單個(gè)RNN模型匀泊,通過逐項(xiàng)逐項(xiàng)讀取每個(gè)單詞(Liu和Lane,2015;Mesnil等人朵你,2015年;彭和姚各聘,2015年)。
This approach has a constrain that the number of slot tags generated should be the same as that of words in an utterance.</br>
這種方法有一個(gè)限制撬呢,即生成的槽標(biāo)記的數(shù)量應(yīng)該與話語中的單詞相同伦吠。
Another way to overcome this limitation is by using an encoder-decoder model containing two RNN models as an encoder for input and a decoder for output (Liu and Lane, 2016a).</br>
另一種克服這種限制的方法是使用一個(gè)編碼-解碼器模型,其中包含兩個(gè)RNN模型作為輸入的編碼器和輸出的解碼器(Liu和Lane,2016a)毛仪。
The advantage of doing this is that it gives the system capability of matching an input utterance and output slot tags with different lengths without the need of alignment. Besides using RNN, It is also possible to use the convolutional neural network (CNN) together with a conditional random field (CRF) to achieve slot filling task (Xu and Sarikaya, 2013).</br>
這樣做的好處是搁嗓,它提供了匹配輸入的聲音和輸出槽標(biāo)記的系統(tǒng)功能,而不需要對齊箱靴。除了使用RNN之外腺逛,還可以使用卷積神經(jīng)網(wǎng)絡(luò)(CNN)和一個(gè)條件隨機(jī)場(CRF)來實(shí)現(xiàn)插槽填充任務(wù)(Xu和Sarikaya,2013)衡怀。
2.3 Joint model for two tasks 兩個(gè)任務(wù)的聯(lián)合模型
It is also possible to use one joint model for intent detection and slot filling (Guo et al., 2014; Liu and Lane, 2016a,b; Zhang and Wang, 2016; Hakkani-T¨ur et al., 2016). One way is by using one encoder with two decoders, the first decoder will generate sequential semantic tags and the second decoder generates the intent.
也可以使用一個(gè)聯(lián)合模型來進(jìn)行意圖檢測和插槽填充(郭等棍矛,2014;劉和萊恩,2016a抛杨,b;張和王够委,2016;Hakkani-T ur等人,2016年)怖现。一種方法是使用一個(gè)編碼器和兩個(gè)解碼器茁帽,第一個(gè)解碼器將生成順序語義標(biāo)簽,第二個(gè)解碼器會產(chǎn)生意圖屈嗤。
Another approach is by consolidating the hidden states information from an RNN slot filling model, then generates its intent using an attention model (Liu and Lane, 2016a).
另一種方法是將隱藏狀態(tài)信息從RNN槽填充模型中合并潘拨,然后使用注意力模型(Liu和Lane,2016a)生成其意圖饶号。
Both of the two approaches demonstrates very good results on ATIS dataset.</br>
這兩種方法都在ATIS數(shù)據(jù)集上顯示了非常好的結(jié)果铁追。
3 Bi-model RNN structures for joint semantic frame parsing 用于聯(lián)合語義框架解析的雙向RNN結(jié)構(gòu)
Despite the success of RNN based sequence to sequence (or encoder-decoder) model on both tasks, most of the approaches in literature still use one single RNN model for each task or both tasks.
盡管在這兩個(gè)任務(wù)上,基于RNN的序列(或編碼-解碼器)模型都取得了成功茫船,但大多數(shù)文獻(xiàn)中的方法仍然使用一個(gè)單一的RNN模型來完成每個(gè)任務(wù)或兩個(gè)任務(wù)琅束。
They treat the intent detection and slot filling as two separate tasks.</br>
他們將意圖檢測和插槽填充視為兩個(gè)獨(dú)立的任務(wù)。
In this section, two new Bi-model structures are proposed to take their cross-impact into account, hence further improve their performance.
在本節(jié)中透硝,提出了兩個(gè)新的雙模型結(jié)構(gòu)狰闪,以將它們的交叉影響考慮在內(nèi),從而進(jìn)一步提高它們的性能濒生。
One structure takes the advantage of a decoder structure and the other doesn’t.
一種結(jié)構(gòu)利用了解碼器結(jié)構(gòu)的優(yōu)勢,而另一種結(jié)構(gòu)則沒有幔欧。
An asynchronous training approach based on two models’ cost functions is designed to adapt to these new structures.
基于兩個(gè)模型的成本函數(shù)的異步訓(xùn)練方法是為了適應(yīng)這些新結(jié)構(gòu)而設(shè)計(jì)的罪治。
3.1 Bi-model RNN Structures 雙向RNN模型架構(gòu)
A graphical illustration of two Bi-model structures with and without a decoder is shown in Figure 1.
圖1顯示了兩個(gè)具有和沒有解碼器的雙模型結(jié)構(gòu)的圖形說明。
The two structures are quite similar to each other except that Figure 1a contains a LSTM based decoder, hence there is an extra decoder state st to be cascaded besides the encoder state ht.
這兩個(gè)結(jié)構(gòu)非常相似礁蔗,只是圖1a包含一個(gè)基于LSTM的解碼器觉义,因此除了編碼器狀態(tài)之外,還有一個(gè)額外的解碼器狀態(tài)浴井。
Remarks: 評論
The concept of using information from multiplemodel/ multi-modal to achieve better performance has been widely used in deep learning (Dean et al., 2012; Wang, 2017; Ngiam et al., 2011; Srivastava and Salakhutdinov, 2012), system identification (Murray-Smith and Johansen, 1997; Narendra et al., 2014, 2015) and also reinforcement learning field recently (Narendra et al., 2016; Wang and Jin, 2018).
在深度學(xué)習(xí)中廣泛使用了利用多模式/多模式信息來獲得更好的性能的概念(Dean et al.晒骇,2012;王,2017;Ngiam等人,2011年;2012年斯利瓦斯塔瓦和薩拉赫哈丁諾夫,系統(tǒng)識別(Murray-Smith和約翰森洪囤,1997;納倫德拉等人徒坡,2014年,2015年)瘤缩,以及最近的強(qiáng)化學(xué)習(xí)領(lǐng)域(納倫德拉等人喇完,2016年;王和金,2018年)剥啤。
Instead of using collective information, in this paper, our work introduces a totally new approach of training multiple neural networks asynchronously by sharing their internal state information.
我們的工作不是使用集體信息锦溪,而是引入了一種全新的方法,通過共享內(nèi)部狀態(tài)信息來異步地訓(xùn)練多個(gè)神經(jīng)網(wǎng)絡(luò)府怯。
3.1.1 Bi-model structure with a decoder 基于解碼結(jié)構(gòu)的雙模型
The Bi-model structure with a decoder is shown as in Figure 1a.
帶有解碼器的雙模型結(jié)構(gòu)如圖1a所示刻诊。
There are two inter-connected bidirectional LSTMs (BLSTMs) in the structure, one is for intent detection and the other is for slot filling.
在結(jié)構(gòu)中有兩個(gè)相互連接的雙向LSTMs(BLSTMs),一個(gè)用于意圖檢測牺丙,另一個(gè)用于槽填充则涯。
Each BLSTM reads in the input utterance sequences forward and backward, and generates two sequences of hidden states and
每一個(gè)BLSTM都在輸入的話語序列中向前和向后讀取,并產(chǎn)生兩個(gè)隱藏狀態(tài) 和的序列赘被。
A concatenation of and forms a final BLSTM state ht = [,] at time step t.
在時(shí)間t的時(shí)候是整,和的串聯(lián)形成了一個(gè)最終的BLSTM狀態(tài)ht = [,]。
Hence, Our bidirectional LSTM fi(·) generates a sequence of hidden states (), where i = 1 corresponds the network for intent detection task and i = 2 is for the slot filling task.
因此民假,我們的雙向LSTM fi(·)產(chǎn)生一系列隱藏狀態(tài)()浮入,其中i=1對應(yīng)于意圖檢測任務(wù)的網(wǎng)絡(luò),而i=2則用于槽填充任務(wù)羊异。
In order to detect intent, hidden state is combined together with from the other bidirectional LSTM f2(·) in slot filling task-network to generate the state of g1(·), , at time step t:
為了檢測意圖事秀,隱藏狀態(tài)與其他雙向LSTM f2()在槽填充任務(wù)網(wǎng)絡(luò)中的結(jié)合在一起,以生成g1()野舶、的狀態(tài)易迹,在時(shí)間t:
where contains the predicted probabilities for all intent labels at the last time step n.</br>
包含了最后一次步驟n中所有意圖標(biāo)簽的預(yù)測概率。
For the slot filling task, a similar network structure is constructed with a BLSTM f2(·) and a LSTM g2(·). f2(·) is the same as f1(·), by reading in the a word sequence as its input.
對于槽填充任務(wù),類似的網(wǎng)絡(luò)結(jié)構(gòu)是用BLSTM f2(·)和lstmg2(·)構(gòu)建的平道。f2(·)與f1(·)相同,通過在一個(gè)單詞序列中閱讀作為輸入睹欲。
The difference is that there will be an output at each time step t for g2(·), as it is a sequence labeling problem.
不同之處在于,對于g2()的每一次一屋,都會有一個(gè)輸出窘疮,因?yàn)樗且粋€(gè)序列標(biāo)簽問題。
At each step t:</br>
在每一步t:
where is the predicted semantic tags at time step t.
是時(shí)間t的預(yù)測語義標(biāo)記冀墨。
3.1.2 Bi-Model structure without a decoder 無解碼器的雙模結(jié)構(gòu)
The Bi-model structure without a decoder is shown as in Figure 1b.</br>
沒有解碼器的雙模型結(jié)構(gòu)如圖1b所示闸衫。
In this model, there is no LSTM decoder as in the previous model.</br>
在這個(gè)模型中,沒有像前一個(gè)模型那樣的LSTM解碼器诽嘉。
For the intent task, only one predicted output label y1 intent is generated from BLSTM f1(·) at the last time step n, where n is the length of the utterance.</br>
對于意圖任務(wù)蔚出,只有一個(gè)預(yù)測的輸出標(biāo)簽y1意圖是從BLSTM f1()在最后一次步驟n中生成的弟翘,其中n是話語的長度。
Similarly, the state value and output intent label are generated as:</br>
類似地骄酗,國家價(jià)值和輸出意圖標(biāo)簽生成如下:
For the slot filling task, the basic structure of BLSTM f2(·) is similar to that for the intent detection task f1(·), except that there is one slot tag label generated at each time step t.
對于槽填充任務(wù)稀余,BLSTM f2()的基本結(jié)構(gòu)與意圖檢測任務(wù)f1()相似,只是在每個(gè)時(shí)間步驟t中生成一個(gè)槽標(biāo)簽標(biāo)簽酥筝。
It also takes the hidden state from two BLSTMs f1(·) and f2(·), i.e. and , plus the output tag together to generate its next state value and also the slot tag . To represent this as a function mathematically:
它還從兩個(gè)BLSTMs f1()和f2()中獲取隱藏狀態(tài)滚躯,即和,加上輸出標(biāo)簽一起產(chǎn)生下一個(gè)狀態(tài)值$h^2_t$
和插槽標(biāo)記嘿歌。用數(shù)學(xué)方法來表示這個(gè)函數(shù):
3.1.3 Asynchronous training 異步訓(xùn)練
One of the major differences in the Bi-model structure is its asynchronous training, which trains two task-networks based on their own cost functions in an asynchronous manner.
雙模型結(jié)構(gòu)的主要區(qū)別之一是它的異步訓(xùn)練掸掏,它以異步方式訓(xùn)練兩個(gè)任務(wù)網(wǎng)絡(luò),以它們自己的成本函數(shù)為基礎(chǔ)宙帝。
The loss function for intent detection task-network is , and for slot filling is . and are defined using cross entropy as:
意圖檢測任務(wù)網(wǎng)絡(luò)的損失函數(shù)是丧凤,而槽填充是。和是用交叉熵定義的:
and
where k is the number of intent label types, m is the number of semantic tag types and n is the number of words in a word sequence.
當(dāng)k是意圖標(biāo)簽類型的數(shù)量時(shí)步脓,m是語義標(biāo)記類型的數(shù)量愿待,而n是單詞序列中的單詞數(shù)量。
In each training iteration, both intent detection and slot filling networks will generate a groups of hidden states and from the models in previous iteration.
在每次訓(xùn)練迭代中靴患,意圖檢測和插槽填充網(wǎng)絡(luò)將在以前的迭代中產(chǎn)生一組隱藏狀態(tài)和仍侥。
The intent detection task-network reads in a batch of input data and hidden states , and generates the estimated intent labels .
意圖檢測任務(wù)網(wǎng)絡(luò)在一批輸入數(shù)據(jù)和隱藏狀態(tài)中讀取,并生成估計(jì)的意圖標(biāo)簽鸳君。
The intent detection task-network computes its cost based on function and trained on that.
意圖檢測任務(wù)網(wǎng)絡(luò)根據(jù)功能計(jì)算其成本农渊,并對其進(jìn)行培訓(xùn)。
Then the same batch of data will be fed into the slot filling task network together with the hidden state from intent task-network, and further generates a batch of outputs for each time step.
然后或颊,同樣的數(shù)據(jù)將被輸入到槽填充任務(wù)網(wǎng)絡(luò)中砸紊,以及來自意圖任務(wù)網(wǎng)絡(luò)的隱藏狀態(tài),并在每一個(gè)時(shí)間步驟中進(jìn)一步生成一批輸出囱挑。
Its cost value is then computed based on cost function , and further trained on that.
然后根據(jù)成本函數(shù)來計(jì)算它的成本值醉顽,并對其進(jìn)行進(jìn)一步的培訓(xùn)。
The reason of using asynchronous training approach is because of the importance of keeping two separate cost functions for different tasks. Doing this has two main advantages:
使用異步培訓(xùn)方法的原因是平挑,對于不同的任務(wù)游添,保持兩個(gè)獨(dú)立的成本函數(shù)是很重要的。這樣做有兩個(gè)主要優(yōu)點(diǎn):
It filters the negative impact between two tasks in comparison to using only one joint model, by capturing more useful information and overcoming the structural limitation of one model.
它通過捕獲更有用的信息和克服一個(gè)模型的結(jié)構(gòu)限制通熄,來過濾兩個(gè)任務(wù)之間的負(fù)面影響否淤,而不是只使用一個(gè)聯(lián)合模型。
The cross-impact between two tasks can only be learned by sharing hidden states of two models, which are trained using two cost functions separately.
兩個(gè)任務(wù)之間的交叉影響只能通過共享兩個(gè)模型的隱藏狀態(tài)來學(xué)習(xí)棠隐,這兩個(gè)模型分別使用兩個(gè)成本函數(shù)進(jìn)行訓(xùn)練。