實(shí)驗(yàn)1:PTB數(shù)據(jù)集實(shí)驗(yàn)
教程: https://www.tensorflow.org/versions/r0.12/tutorials/recurrent/
數(shù)據(jù)地址: http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
下載解壓后疏尿,./simple-examples/data下的文件:
README
ptb.char.test.txt
ptb.char.train.txt
ptb.char.valid.txt
ptb.test.txt
ptb.train.txt
ptb.valid.txt
ptb.*.txt 格式一樣涯保,每行一個句子脆诉,每個單詞用空格相隔,分別作為訓(xùn)練集晴楔、驗(yàn)證集和測試集
ptb.char.*.txt 格式一樣扯键,每個字符用空格相隔,每個單詞用"_"相隔
代碼地址: https://github.com/tensorflow/models/blob/master/tutorials/rnn/ptb/ptb_word_lm.py
運(yùn)行code:
cd models/rnn/ptb
python ptb_word_lm.py --data_path=./simple-examples/data/ --model medium
迭代39次习蓬,最后兩次迭代結(jié)果如下:
Epoch: 38 Learning rate: 0.001
0.008 perplexity: 53.276 speed: 8650 wps
0.107 perplexity: 47.396 speed: 8614 wps
0.206 perplexity: 49.082 speed: 8635 wps
0.306 perplexity: 48.002 speed: 8643 wps
0.405 perplexity: 47.800 speed: 8646 wps
0.505 perplexity: 47.917 speed: 8649 wps
0.604 perplexity: 47.110 speed: 8650 wps
0.704 perplexity: 47.361 speed: 8651 wps
0.803 perplexity: 46.620 speed: 8652 wps
0.903 perplexity: 45.850 speed: 8652 wps
Epoch: 38 Train Perplexity: 45.906
Epoch: 38 Valid Perplexity: 88.246
Epoch: 39 Learning rate: 0.001
0.008 perplexity: 52.994 speed: 8653 wps
0.107 perplexity: 47.077 speed: 8655 wps
0.206 perplexity: 48.910 speed: 8493 wps
0.306 perplexity: 48.088 speed: 8545 wps
0.405 perplexity: 47.966 speed: 8573 wps
0.505 perplexity: 47.977 speed: 8589 wps
0.604 perplexity: 47.122 speed: 8601 wps
0.704 perplexity: 47.305 speed: 8609 wps
0.803 perplexity: 46.564 speed: 8615 wps
0.903 perplexity: 45.826 speed: 8620 wps
Epoch: 39 Train Perplexity: 45.873
Epoch: 39 Valid Perplexity: 88.185
Test Perplexity: 83.922
在Tesla M40 24GB上訓(xùn)練花了大約70分鐘。
其他參考:
http://www.cnblogs.com/edwardbi/p/5554353.html
實(shí)驗(yàn)2:Char-RNN 實(shí)驗(yàn)
代碼和教程: https://github.com/sherjilozair/char-rnn-tensorflow
訓(xùn)練數(shù)據(jù):福爾摩斯探案全集 (下載地址)
下載下來是純文本文件措嵌,一共66766行躲叼。按照教程放在./data/sherlock下并重命名為input.txt.
目標(biāo): 訓(xùn)練語言模型,然后輸出句子
訓(xùn)練
python train.py --data_dir=./data/sherlock > 1.log 2>&1&
有很多參數(shù)可調(diào)铅匹,結(jié)果默認(rèn)保存在目錄./save下押赊。訓(xùn)練一共花了約1小時22分。
默認(rèn)是迭代50個epoch包斑,實(shí)驗(yàn)中發(fā)現(xiàn)采用默認(rèn)參數(shù)大約迭代10個epoch訓(xùn)練loss就沒下降了,所以訓(xùn)練時可以加參數(shù) --num_epochs 10.
測試
python sample.py --save_dir ./save -n 100
輸出100個字符:
示例1(含空格)
very occasion I could never see, this people, for if Lestrade to the Fingers for me. These pinded
示例2(含空格)
CHAPTER V CORA" 2I Uppard in his leggy. You will give she.
"But you
remember that
示例3(含空格)
CHAPTEBENII
But the pushfuit who had honour had danger with such an instrumented. This sprang
語句并不是很通順涕俗,但是單詞基本上還是對的罗丰。
如果要進(jìn)一步提升效果的話,可以清洗下語料再姑,使每個輸入都是完整的句子萌抵,同時嘗試不同的模型參數(shù)。
不過更值得嘗試的是中文數(shù)據(jù),下次找一篇中文小說訓(xùn)練看看绍填。