先看看seq2seq原理:
encoder通過學(xué)習(xí)將輸入embedding后傳入rnn網(wǎng)絡(luò)形成一個固定大小的狀態(tài)向量S承桥,并將S傳給Decoder, Decoder一樣通過學(xué)習(xí)embedding后傳入RNN網(wǎng)絡(luò),并輸出預(yù)測結(jié)果盏筐。
優(yōu)缺點:
這樣可以解決輸入和輸出不等長的問題庙洼,如文本翻譯泛释。但是因為encoder到decoder都依賴一個固定大小的狀態(tài)向量S淘讥,所以咱們可以想象一下,信息越大悲关,轉(zhuǎn)化成為的S損失越大,隨著序列長度增加娄柳,S損失的信息會越來越大寓辱。這個是Seq2seq的缺陷,所以要引入attention及Bi-directional encoder layer等赤拒。
1.Encoder分兩步:
1.1 首先把輸入進行embedding完成對輸入序列數(shù)據(jù)嵌入工作秫筏,這里用到tf.contrib.layers.embed_sequence。
假如我們有一個batch=2挎挖,sequence_length=5的樣本跳昼,features = [[1,2,3,4,5],[6,7,8,9,10]],使用tf.contrib.layers.embed_sequence(features,vocab_size=n_words, embed_dim=10)
那么我們會得到一個2 x 5 x 10的輸出肋乍,其中features中的每個數(shù)字都被embed成了一個10維向量。
encoder_embed_input = tf.contrib.layers.embed_sequence(input_data, source_vocab_size, encoding_embedding_size)
1.2 然后embedding完的向量傳入RNN進行處理敷存,返回encoder_output, encoder_state
def get_lstm_cell(rnn_size):
lstm_cell = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))
return lstm_cell
cell = tf.contrib.rnn.MultiRNNCell([get_lstm_cell(rnn_size) for _ in range(num_layers)])
encoder_output, encoder_state = tf.nn.dynamic_rnn(cell, encoder_embed_input,
sequence_length=source_sequence_length, dtype=tf.float32)
2.Decoder分三步
2.1 對target數(shù)據(jù)進行預(yù)處理
為什么這一步要做預(yù)處理墓造?
- 左邊encoder紅框很簡單,A,B,C融匯成一個輸出
- 右邊decoder紅框接受一個輸出后锚烦,傳給每個RNN進行解碼
- <GO>為解碼開始符 <EOS>為解碼結(jié)束符
我們預(yù)處理就要對encoder傳過來的輸出(添加<GO>觅闽,去掉<EOS>),用tf.strided_slice()
def process_decoder_input(data, vocab_to_int, batch_size):
'''
補充<GO>涮俄,并移除最后一個字符
'''
# cut掉最后一個字符
ending = tf.strided_slice(data, [0, 0], [batch_size, -1], [1, 1])
decoder_input = tf.concat([tf.fill([batch_size, 1], vocab_to_int['<GO>']), ending], 1)
return decoder_input
2.2 對target數(shù)據(jù)進行embedding
target_vocab_size = len(target_letter_to_int)
decoder_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))
decoder_embed_input = tf.nn.embedding_lookup(decoder_embeddings, decoder_input)
2.3 處理完的數(shù)據(jù)傳入RNN蛉拙,返回訓(xùn)練和預(yù)測的output
def get_decoder_cell(rnn_size):
decoder_cell = tf.contrib.rnn.LSTMCell(rnn_size,
initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))
return decoder_cell
cell = tf.contrib.rnn.MultiRNNCell([get_decoder_cell(rnn_size) for _ in range(num_layers)])
output_layer = Dense(target_vocab_size,
kernel_initializer = tf.truncated_normal_initializer(mean = 0.0, stddev=0.1))
Training Decoder
with tf.variable_scope("decode"):
# 得到help對象
training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=decoder_embed_input,
sequence_length=target_sequence_length,
time_major=False)
# 構(gòu)造decoder
training_decoder = tf.contrib.seq2seq.BasicDecoder(cell,
training_helper,
encoder_state,
output_layer)
training_decoder_output, _ = tf.contrib.seq2seq.dynamic_decode(training_decoder,
impute_finished=True,
maximum_iterations=max_target_sequence_length)
Prediction decoder
# 與training共享參數(shù)
with tf.variable_scope("decode", reuse=True):
# 創(chuàng)建一個常量tensor并復(fù)制為batch_size的大小
start_tokens = tf.tile(tf.constant([target_letter_to_int['<GO>']], dtype=tf.int32), [batch_size],
name='start_tokens')
predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(decoder_embeddings,
start_tokens,
target_letter_to_int['<EOS>'])
predicting_decoder = tf.contrib.seq2seq.BasicDecoder(cell,
predicting_helper,
encoder_state,
output_layer)
predicting_decoder_output, _ = tf.contrib.seq2seq.dynamic_decode(predicting_decoder,
impute_finished=True,
maximum_iterations=max_target_sequence_length)
return training_decoder_output, predicting_decoder_output
3.完成了encoder和decoder之后,再把兩者連接起來形成seq2seq模型
def seq2seq_model(input_data, targets, lr, target_sequence_length,
max_target_sequence_length, source_sequence_length,
source_vocab_size, target_vocab_size,
encoder_embedding_size, decoder_embedding_size,
rnn_size, num_layers):
# 獲取encoder的狀態(tài)輸出
_, encoder_state = get_encoder_layer(input_data,
rnn_size,
num_layers,
source_sequence_length,
source_vocab_size,
encoding_embedding_size)
# 預(yù)處理后的decoder輸入
decoder_input = process_decoder_input(targets, target_letter_to_int, batch_size)
# 將狀態(tài)向量與輸入傳遞給decoder
training_decoder_output, predicting_decoder_output = decoding_layer(target_letter_to_int,
decoding_embedding_size,
num_layers,
rnn_size,
target_sequence_length,
max_target_sequence_length,
encoder_state,
decoder_input)
return training_decoder_output, predicting_decoder_output
這是個簡單的seq2seq模型彻亲,只是對單詞的字母進行簡單的排序孕锄,數(shù)據(jù)處理部分也比較簡單,符合本篇的宗旨:講清楚什么是seq2seq模型苞尝。下一篇將應(yīng)用seq2seq模型進行英法兩種語言的文本翻譯實戰(zhàn)畸肆。