關(guān)于IMDB
IMDB是一個Keras內(nèi)置的電影評論數(shù)據(jù)集缘挑。分為評論和評價兩部分蒜撮。評價簡單的分為正面(1)和負面(0)襟士。
from keras.datasets import imdb
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
print('train_data shape:{}, test_data shape:{}'.format(train_data.shape, test_data.shape))
print(train_data[0])
訓(xùn)練集和測試集各有25000條評論,評論以數(shù)字集合表示烈拒,每個數(shù)字有其代表的單詞圆裕,類似詞典。
[1, 14, 22, 16, 43, 530, 973, 1622...]
對IMDB的數(shù)據(jù)做解碼荆几,轉(zhuǎn)換為人可以閱讀的形式:
word_index = imdb.get_word_index()
re_word_index = dict([(value, key) for (key, value) in word_index.items()])
decode_review = ' '.join(re_word_index.get(i-3, '?') for i in train_data[0])
print(decode_review)
解碼后的結(jié)果:
"this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert..."
原始數(shù)據(jù)向量化
由于原始數(shù)據(jù)是列表格式吓妆,無法直接導(dǎo)入Keras模型中,需要先把數(shù)據(jù)向量化
import numpy as np
def process_data(sequences, dim): # data vectorize
ret = np.zeros((len(sequences), dim)) # dim will be sat as 10000
for i, it in enumerate(sequences):
ret[i, it] = 1
return ret
train_data = process_data(train_data, 10000)
test_data = process_data(test_data, 10000)
構(gòu)建網(wǎng)絡(luò)吨铸;
2層16個輸出的密集層網(wǎng)絡(luò)+1層1個輸出的密集層網(wǎng)絡(luò)行拢,最后輸出0~1之間的概率
from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=(10000, )))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
編譯模型
from keras import optimizers
from keras import losses
from keras import metrics
model.compile(optimizer=optimizers.RMSprop(lr=0.001),
loss=losses.binary_crossentropy,
metrics=[metrics.binary_accuracy])
訓(xùn)練模型&測試;
因為事先做過測試诞吱,在訓(xùn)練4輪后開始過擬合舟奠,所以epochs直接設(shè)置為4
model.fit(train_data, train_labels, batch_size=512, epochs=4)
result = model.evaluate(test_data, test_labels)
print(result)
在沒有GPU的ThinkPad筆記本上,訓(xùn)練時長不到1分鐘狐胎;準確度達到了88%鸭栖。下一步研究用RNN(循環(huán)神經(jīng)網(wǎng)絡(luò))對相同數(shù)據(jù)集進行訓(xùn)練。