先看一個基礎(chǔ)的二分類問題询件。使用keras實現(xiàn)感知機算法桑阶。keras提供了一些官方數(shù)據(jù)集分別對于二分類,多分類港华,回歸問題道川。其中IMDB評論數(shù)據(jù)集是二分類問題,Reuters數(shù)據(jù)集是多分類問題, house prices是回歸問題冒萄。
train_data是單詞在評論中出現(xiàn)的下標臊岸, test_label是用戶對電影的喜好,0: negative, 1: positive尊流。
分別查看下positive 和 negative 評論:
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
word_index = imdb.get_word_index()
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
good_review = ' '.join([reverse_word_index.get(i-3,'?') for i in train_data[0]])
# 輸出
"? this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert redford's is an amazing actor and now the same being director norman's father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for retail and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also congratulations to the two little boy's that played the part's of norman and paul they were just brilliant children are often left out of the praising list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all"
bad_review = ' '.join([reverse_word_index.get(i-3,'?') for i in train_data[0]])
# 輸出
"? big hair big boobs bad music and a giant safety pin these are the words to best describe this terrible movie i love cheesy horror movies and i've seen hundreds but this had got to be on of the worst ever made the plot is paper thin and ridiculous the acting is an abomination the script is completely laughable the best is the end showdown with the cop and how he worked out who the killer is it's just so damn terribly written the clothes are sickening and funny in equal measures the hair is big lots of boobs bounce men wear those cut tee shirts that show off their stomachs sickening that men actually wore them and the music is just synthesiser trash that plays over and over again in almost every scene there is trashy music boobs and paramedics taking away bodies and the gym still doesn't close for bereavement all joking aside this is a truly bad film whose only charm is to look back on the disaster that was the 80's and have a good old laugh at how bad everything was back then"
構(gòu)建兩層感知機算法帅戒, 進行分類,分類之前我們對數(shù)據(jù)進行預(yù)處理崖技,進行one_hot encoding蜘澜。 只考慮出現(xiàn)頻率前1000的數(shù)據(jù),在樣本中出現(xiàn)為1响疚, 不出現(xiàn)為0, 每個樣本數(shù)據(jù)為1000 維向量。然后將處理后數(shù)據(jù)輸入感知機算法當(dāng)中瞪醋。
import os
import numpy as np
from keras.models import Sequential, Model
from keras import layers
from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers
from keras.applications.vgg16 import VGG16
from keras.utils.np_utils import to_categorical
from scipy.misc import imread, imresize
import matplotlib.pyplot as plt
from keras.datasets import imdb
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
word_index = imdb.get_word_index()
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
decoded_review = ' '.join([reverse_word_index.get(i-3,'?') for i in train_data[0]])
def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1. # set specific indices of results[i] to 1s
return results
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')
# define the model
model = Sequential()
model.add(layers.Dense(10, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='mse', metrics=['accuracy'])
x_val = x_train[:10000]
partial_x_train = x_train[10000:]
y_val = y_train[:10000]
partial_y_train = y_train[10000:]
history = model.fit(partial_x_train, partial_y_train, epochs=10, batch_size=512, validation_data=(x_val, y_val))
metrics = model.evaluate(x_test, y_test)
print(model.metrics_names)
print(metrics)