Author: shikanon
CreateTime: 2017-02-13 10:33:34
Tensorflow1.0正式發(fā)布熬拒,谷歌首屆Tensorflow開發(fā)者大會(huì)在山景召開,深度學(xué)習(xí)迎來新的高潮和狂歡宽堆。隨著深度學(xué)習(xí)框架的普及和推廣郑原,會(huì)有越來越多人加入到這場盛宴中來唉韭,就像Android技術(shù)的普及使得開發(fā)人員迅速擴(kuò)大。在這里給大家?guī)硪惶仔“兹腴T深度學(xué)習(xí)的基礎(chǔ)教程犯犁,使用得是Keras属愤,一個(gè)高級(jí)神經(jīng)網(wǎng)絡(luò)庫,同時(shí)也是Tensorflow1.0引進(jìn)的一個(gè)高層API栖秕。
- 目錄
一春塌、基礎(chǔ)篇
神經(jīng)網(wǎng)絡(luò)中的每個(gè)神經(jīng)元 對(duì)其所有的輸入進(jìn)行加權(quán)求和,并添加一個(gè)被稱為偏置(bias) 的常數(shù),然后通過一些非線性激活函數(shù)來反饋結(jié)果只壳。
數(shù)據(jù)集我們采用深度學(xué)習(xí)界的Hello-Word———— MNIST手寫數(shù)字?jǐn)?shù)據(jù)集俏拱,學(xué)習(xí)從第一個(gè)softmax開始。
1. softmax
softmax主要用來做多分類問題吼句,是logistic回歸模型在多分類問題上的推廣锅必,softmax 公式:
當(dāng)k=2時(shí),轉(zhuǎn)換為邏輯回歸形式惕艳。
softmax一般作為神經(jīng)網(wǎng)絡(luò)最后一層搞隐,作為輸出層進(jìn)行多分類,Softmax的輸出的每個(gè)值都是>=0远搪,并且其總和為1劣纲,所以可以認(rèn)為其為概率分布。
softmax 示意圖
softmax 輸出層示意圖
%pylab inline
Populating the interactive namespace from numpy and matplotlib
from IPython.display import SVG
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Reshape
from keras.optimizers import SGD, Adam
from keras.utils.visualize_util import model_to_dot
from keras.utils import np_utils
import matplotlib.pyplot as plt
import tensorflow as tf
import pandas as pd
Using TensorFlow backend.
#設(shè)置隨機(jī)數(shù)種子,保證實(shí)驗(yàn)可重復(fù)
import numpy as np
np.random.seed(0)
#設(shè)置線程
THREADS_NUM = 20
tf.ConfigProto(intra_op_parallelism_threads=THREADS_NUM)
(X_train, Y_train),(X_test, Y_test) = mnist.load_data()
print('原數(shù)據(jù)結(jié)構(gòu):')
print(X_train.shape, Y_train.shape)
print(X_test.shape, Y_test.shape)
#數(shù)據(jù)變換
#分為10個(gè)類別
nb_classes = 10
x_train_1 = X_train.reshape(60000, 784)
#x_train_1 /= 255
#x_train_1 = x_train_1.astype('float32')
y_train_1 = np_utils.to_categorical(Y_train, nb_classes)
print('變換后的數(shù)據(jù)結(jié)構(gòu):')
print(x_train_1.shape, y_train_1.shape)
x_test_1 = X_test.reshape(10000, 784)
y_test_1 = np_utils.to_categorical(Y_test, nb_classes)
print(x_test_1.shape, y_test_1.shape)
原數(shù)據(jù)結(jié)構(gòu):
((60000, 28, 28), (60000,))
((10000, 28, 28), (10000,))
變換后的數(shù)據(jù)結(jié)構(gòu):
((60000, 784), (60000, 10))
((10000, 784), (10000, 10))
# 構(gòu)建一個(gè)softmax模型
# neural network with 1 layer of 10 softmax neurons
#
# · · · · · · · · · · (input data, flattened pixels) X [batch, 784] # 784 = 28 * 28
# \x/x\x/x\x/x\x/x\x/ -- fully connected layer (softmax) W [784, 10] b[10]
# · · · · · · · · Y [batch, 10]
# The model is:
#
# Y = softmax( X * W + b)
# X: matrix for 100 grayscale images of 28x28 pixels, flattened (there are 100 images in a mini-batch)
# W: weight matrix with 784 lines and 10 columns
# b: bias vector with 10 dimensions
# +: add with broadcasting: adds the vector to each line of the matrix (numpy)
# softmax(matrix) applies softmax on each line
# softmax(line) applies an exp to each value then divides by the norm of the resulting line
# Y: output matrix with 100 lines and 10 columns
model = Sequential()
model.add(Dense(nb_classes, input_shape=(784,)))#全連接谁鳍,輸入784維度, 輸出10維度癞季,需要和輸入輸出對(duì)應(yīng)
model.add(Activation('softmax'))
sgd = SGD(lr=0.005)
#binary_crossentropy,就是交叉熵函數(shù)
model.compile(loss='binary_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
#model 概要
model.summary()
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
dense_1 (Dense) (None, 10) 7850 dense_input_1[0][0]
____________________________________________________________________________________________________
activation_1 (Activation) (None, 10) 0 dense_1[0][0]
====================================================================================================
Total params: 7,850
Trainable params: 7,850
Non-trainable params: 0
____________________________________________________________________________________________________
SVG(model_to_dot(model).create(prog='dot', format='svg'))
from keras.callbacks import Callback, TensorBoard
import tensorflow as tf
#構(gòu)建一個(gè)記錄的loss的回調(diào)函數(shù)
class LossHistory(Callback):
def on_train_begin(self, logs={}):
self.losses = []
def on_batch_end(self, batch, logs={}):
self.losses.append(logs.get('loss'))
# 構(gòu)建一個(gè)自定義的TensorBoard類倘潜,專門用來記錄batch中的數(shù)據(jù)變化
class BatchTensorBoard(TensorBoard):
def __init__(self,log_dir='./logs',
histogram_freq=0,
write_graph=True,
write_images=False):
super(BatchTensorBoard, self).__init__()
self.log_dir = log_dir
self.histogram_freq = histogram_freq
self.merged = None
self.write_graph = write_graph
self.write_images = write_images
self.batch = 0
self.batch_queue = set()
def on_epoch_end(self, epoch, logs=None):
pass
def on_batch_end(self,batch,logs=None):
logs = logs or {}
self.batch = self.batch + 1
for name, value in logs.items():
if name in ['batch', 'size']:
continue
summary = tf.Summary()
summary_value = summary.value.add()
summary_value.simple_value = float(value)
summary_value.tag = "batch_" + name
if (name,self.batch) in self.batch_queue:
continue
self.writer.add_summary(summary, self.batch)
self.batch_queue.add((name,self.batch))
self.writer.flush()
tensorboard = TensorBoard(log_dir='/home/tensorflow/log/softmax/epoch')
my_tensorboard = BatchTensorBoard(log_dir='/home/tensorflow/log/softmax/batch')
model.fit(x_train_1, y_train_1,
nb_epoch=20,
verbose=0,
batch_size=100,
callbacks=[tensorboard, my_tensorboard])
<keras.callbacks.History at 0xa86d650>
損失函數(shù)
損失函數(shù)(loss function)绷柒,是指一種將一個(gè)事件(在一個(gè)樣本空間中的一個(gè)元素)映射到一個(gè)表達(dá)與其事件相關(guān)的經(jīng)濟(jì)成本或機(jī)會(huì)成本的實(shí)數(shù)上的一種函數(shù),在統(tǒng)計(jì)學(xué)中損失函數(shù)是一種衡量損失和錯(cuò)誤(這種損失與“錯(cuò)誤地”估計(jì)有關(guān)涮因,如費(fèi)用或者設(shè)備的損失)程度的函數(shù)废睦。
交叉熵(cross-entropy)就是神經(jīng)網(wǎng)絡(luò)中常用的損失函數(shù)。
交叉熵性質(zhì):
(1)非負(fù)性养泡。
(2)當(dāng)真實(shí)輸出a與期望輸出y接近的時(shí)候嗜湃,代價(jià)函數(shù)接近于0.(比如y=0,a~0瓤荔;y=1净蚤,a~1時(shí),代價(jià)函數(shù)都接近0)输硝。
一個(gè)比較簡單的理解就是使得 預(yù)測(cè)值Yi和真實(shí)值Y' 對(duì)接近今瀑,即兩者的乘積越大,coss-entropy越小点把。
交叉熵和準(zhǔn)確度變化圖像可以看 TensorBoard 橘荠。
梯度下降
如果對(duì)于所有的權(quán)重和所有的偏置計(jì)算交叉熵的偏導(dǎo)數(shù),就得到一個(gè)對(duì)于給定圖像郎逃、標(biāo)簽和當(dāng)前權(quán)重和偏置的「梯度」哥童,如圖所示:
我們希望損失函數(shù)最小,也就是需要到達(dá)交叉熵最小的凹點(diǎn)的低部褒翰。在上圖中贮懈,交叉熵被表示為一個(gè)具有兩個(gè)權(quán)重的函數(shù)匀泊。
而學(xué)習(xí)速率,即在梯度下降中的步伐大小朵你。
#模型的測(cè)試誤差指標(biāo)
print(model.metrics_names)
# 對(duì)測(cè)試數(shù)據(jù)進(jìn)行測(cè)試
model.evaluate(x_test_1, y_test_1,
verbose=1,
batch_size=100)
['loss', 'acc']
9800/10000 [============================>.] - ETA: 0s
[0.87580669939517974, 0.94387999653816224]
上面各聘,我們探索了softmax對(duì)多分類的支持和理解,知道softmax可以作為一個(gè)輸出成層進(jìn)行多分類任務(wù)抡医。
但是躲因,這種分類任務(wù)解決的都是線性因素形成的問題,對(duì)于非線性的忌傻,特別是異或問題大脉,如何解決呢?
這時(shí)水孩,一種包含多層隱含層的深度神經(jīng)網(wǎng)絡(luò)的概念被提出镰矿。
3. 激活函數(shù)
激活函數(shù)(activation function)可以使得模型加入非線性因素的。
解決非線性問題有兩個(gè)辦法:線性變換荷愕、引入非線性函數(shù)衡怀。
(1)線性變換(linear transformation)
原本一個(gè)線性不可分的模型如:X^2 + Y^2 = 1
其圖形如下圖所示:
fig = plt.figure(0)
degree = np.random.rand(50)*np.pi*2
x_1 = np.cos(degree)*np.random.rand(50)
y_1 = np.sin(degree)*np.random.rand(50)
x_2 = np.cos(degree)*(1+np.random.rand(50))
y_2 = np.sin(degree)*(1+np.random.rand(50))
# x_3 和 y_3 就是切分線
t = np.linspace(0,np.pi*2,50)
x_3 = np.cos(t)
y_3 = np.sin(t)
scatter(x_1,y_1,c='red',s=50,alpha=0.4,marker='o')
scatter(x_2,y_2,c='black',s=50,alpha=0.4,marker='o')
plot(x_3,y_3)
將坐標(biāo)軸進(jìn)行高維變換,橫坐標(biāo)變成X^2安疗,縱坐標(biāo)變成 Y^2,這是表達(dá)式變?yōu)榱?X + Y = 1够委,這樣荐类,原來的非線性問題,就變成了一個(gè)線性可分的問題茁帽,變成了一個(gè)簡單的一元一次方程了玉罐。
詳細(xì)可以參見下圖:
fig2 = plt.figure(1)
#令新的橫坐標(biāo)變成x^2,縱坐標(biāo)變成 Y^2
x_4 = x_1**2
y_4 = y_1**2
x_5 = x_2**2
y_5 = y_2**2
# 這樣就可以構(gòu)建一個(gè)一元線性方程進(jìn)行擬合
x_6 = np.linspace(-1,2,50)
y_6 = 1 - x_6
scatter(x_4,y_4,c='red',s=50,alpha=0.4,marker='o')
scatter(x_5,y_5,c='black',s=50,alpha=0.4,marker='o')
plot(x_6,y_6)
(2)引入非線性函數(shù)
異或是一種基于二進(jìn)制的位運(yùn)算,用符號(hào)XOR 表示(Python中的異或操作符為 ^ )潘拨,其運(yùn)算法則是對(duì)運(yùn)算符兩側(cè)數(shù)的每一個(gè)二進(jìn)制位吊输,同值取0,異值取1铁追。
下面是一個(gè)典型的異或表:
table = {'x':[1,0,1,0],'y':[1,0,0,1]}
df = pd.DataFrame(table)
df['z'] = df['x']^df['y']
df
x = 1, y = 1, 則 z = 0
x = 0, y = 0, 則 z = 0
x = 1, y = 0, 則 z = 1
x = 0, y = 1, 則 z = 1
...
其圖形如下:
fig3 = plt.figure(2)
groups = df.groupby('z')
for name, group in groups:
scatter(group['x'],group['y'],label=name,s=50,marker='o')
那么如果可以構(gòu)建一個(gè)函數(shù)擬合這樣的圖形呢季蚂?即如何構(gòu)建一個(gè)f(),使得:f(x,y)=z呢琅束?
為了解決問題扭屁,我們來構(gòu)建一個(gè)兩層的神經(jīng)網(wǎng)絡(luò),該神經(jīng)網(wǎng)絡(luò)有兩個(gè)激活函數(shù)涩禀,F(xiàn)(x,y)和 H(x,y), 具體如下圖所示:
F(x,y)為一個(gè)閾值為1的閾值函數(shù):
即:當(dāng)AX+BY>1時(shí)候,F(x,y) = 1;否則為0料滥;
if AX+BY > 1:
F = 1
else:
F = 0
H(x,y)為一個(gè)閾值為0的閾值函數(shù):
if AX+BY > 0:
H = 1
else:
H = 0
圖中線的數(shù)字表示權(quán)重值,
- 對(duì)于(1,1)的點(diǎn)艾船,第二層從左到右隱藏層的值分別為(1,1,1),最后輸出為(1,1,1)*(1,-2,1)=0葵腹;
- 對(duì)于(0,0)的點(diǎn)高每,第二層從左到右隱藏層的值分別為(0,0,0),最后輸出為(0,0,0)*(1,-2,1)=0;
- 對(duì)于(1,0)的點(diǎn)践宴,第二層從左到右隱藏層的值分別為(1,0,0),最后輸出為(1,0,0)*(1,-2,1)= 1觉义;
- 對(duì)于(0,1)的點(diǎn),第二層從左到右隱藏層的值分別為(0,0,1),最后輸出為(0,0,1)*(1,-2,1)= 1浴井;```
```python
first_hidder_layer_table = {'x':[1,0,1,0],'y':[1,0,0,0],'z':[1,0,0,1],'output':[0,0,1,1]}
first_hidder_layer_data = pd.DataFrame(first_hidder_layer_table)
first_hidder_layer_data
這樣我們就構(gòu)建出了一個(gè)可以計(jì)算擬合的函數(shù)了晒骇。
我們觀察一下第一個(gè)隱含層,其總共有三個(gè)維度磺浙,三個(gè)權(quán)重值洪囤,從輸入層到第一層,實(shí)際上撕氧,就是從將一個(gè)二維的數(shù)組變成一個(gè)三維數(shù)組瘤缩,從而實(shí)現(xiàn)線性切分。
圖形化解釋:
from mpl_toolkits.mplot3d import Axes3D
fig4 = plt.figure(3)
ax = fig4.add_subplot(111, projection='3d')
groups = first_hidder_layer_data.groupby('output')
for name, group in groups:
ax.scatter(group['x'],group['y'],group['z'],label=name,c=np.random.choice(['black','blue']),s=50,marker='o')
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')
經(jīng)過變換后的數(shù)據(jù)是線性可分的(n維伦泥,比如本例中可以用平面將兩個(gè)不同顏色的點(diǎn)切分)
更多的操作可以參考tensorflow提供的一個(gè)神經(jīng)網(wǎng)絡(luò)的網(wǎng)頁小程序剥啤,通過自己調(diào)整程序參數(shù)可以更深刻理解神經(jīng)網(wǎng)絡(luò)、激活函數(shù)的作用不脯。
演示網(wǎng)址:
http://playground.tensorflow.org/
可以自己建立一個(gè)小型神經(jīng)網(wǎng)絡(luò)幫助理解府怯。
4. sigmoid
sigmoid是一個(gè)用來做二分類的"S"形邏輯回歸曲線
sigmoid公式:
sigmoid圖像:
其抑制兩頭,對(duì)中間細(xì)微變化敏感,因此sigmoid函數(shù)作為最簡單常用的神經(jīng)網(wǎng)絡(luò)激活層被使用防楷。
優(yōu)點(diǎn):
(1)輸出范圍(0,1)牺丙,數(shù)據(jù)在傳遞的過程中不容易發(fā)散
(2)單向遞增
(3)易求導(dǎo)
sigmod有個(gè)缺點(diǎn),sigmoid函數(shù)反向傳播時(shí)复局,很容易就會(huì)出現(xiàn)梯度消失,在接近飽和區(qū)的時(shí)候冲簿,導(dǎo)數(shù)趨向0,會(huì)變得非常緩慢亿昏。因此峦剔,在優(yōu)化器選擇時(shí)選用Adam優(yōu)化器。
Adam 也是基于梯度下降的方法角钩,但是每次迭代參數(shù)的學(xué)習(xí)步長都有一個(gè)確定的范圍吝沫,不會(huì)因?yàn)楹艽蟮奶荻葘?dǎo)致很大的學(xué)習(xí)步長,參數(shù)的值比較穩(wěn)定彤断。有利于降低模型收斂到局部最優(yōu)的風(fēng)險(xiǎn)野舶,而SGD容易收斂到局部最優(yōu),如果下面代碼中的optimizer改成SGD的化宰衙,在一次epoch后就acc值不會(huì)改變了平道,陷入局部最優(yōu)
# 構(gòu)建一個(gè)五層sigmod全連接神經(jīng)網(wǎng)絡(luò)
# neural network with 5 layers
#
# · · · · · · · · · · (input data, flattened pixels) X [batch, 784] # 784 = 28*28
# \x/x\x/x\x/x\x/x\x/ -- fully connected layer (sigmoid) W1 [784, 200] B1[200]
# · · · · · · · · · Y1 [batch, 200]
# \x/x\x/x\x/x\x/ -- fully connected layer (sigmoid) W2 [200, 100] B2[100]
# · · · · · · · Y2 [batch, 100]
# \x/x\x/x\x/ -- fully connected layer (sigmoid) W3 [100, 60] B3[60]
# · · · · · Y3 [batch, 60]
# \x/x\x/ -- fully connected layer (sigmoid) W4 [60, 30] B4[30]
# · · · Y4 [batch, 30]
# \x/ -- fully connected layer (softmax) W5 [30, 10] B5[10]
# · Y5 [batch, 10]
model = Sequential()
model.add(Dense(200, input_shape=(784,)))#全連接,輸入784維度, 輸出10維度供炼,需要和輸入輸出對(duì)應(yīng)
model.add(Activation('sigmoid'))
model.add(Dense(100))# 除了首層需要設(shè)置輸入維度一屋,其他層只需要輸入輸出維度就可以了窘疮,輸入維度自動(dòng)繼承上層。
model.add(Activation('sigmoid'))
model.add(Dense(60))
model.add(Activation('sigmoid'))
model.add(Dense(30)) #model.add(Activation('sigmoid'))和model.add(Dense(30))可以合并寫出
model.add(Activation('sigmoid'))#model.add(Dense(30,activation='softmax'))
model.add(Dense(10))
model.add(Activation('softmax'))
sgd = Adam(lr=0.003)
model.compile(loss='binary_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
#model 概要
model.summary()
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
dense_23 (Dense) (None, 200) 157000 dense_input_7[0][0]
____________________________________________________________________________________________________
activation_23 (Activation) (None, 200) 0 dense_23[0][0]
____________________________________________________________________________________________________
dense_24 (Dense) (None, 100) 20100 activation_23[0][0]
____________________________________________________________________________________________________
activation_24 (Activation) (None, 100) 0 dense_24[0][0]
____________________________________________________________________________________________________
dense_25 (Dense) (None, 60) 6060 activation_24[0][0]
____________________________________________________________________________________________________
activation_25 (Activation) (None, 60) 0 dense_25[0][0]
____________________________________________________________________________________________________
dense_26 (Dense) (None, 30) 1830 activation_25[0][0]
____________________________________________________________________________________________________
activation_26 (Activation) (None, 30) 0 dense_26[0][0]
____________________________________________________________________________________________________
dense_27 (Dense) (None, 10) 310 activation_26[0][0]
____________________________________________________________________________________________________
activation_27 (Activation) (None, 10) 0 dense_27[0][0]
====================================================================================================
Total params: 185,300
Trainable params: 185,300
Non-trainable params: 0
____________________________________________________________________________________________________
SVG(model_to_dot(model).create(prog='dot', format='svg'))
tensorboard2 = TensorBoard(log_dir='/home/tensorflow/log/five_layer_sigmoid/epoch', histogram_freq=0)
my_tensorboard2 = BatchTensorBoard(log_dir='/home/tensorflow/log/five_layer_sigmoid/batch')
model.fit(x_train_1, y_train_1,
nb_epoch=20,
verbose=0,
batch_size=100,
callbacks=[my_tensorboard2, tensorboard2])
<keras.callbacks.History at 0xf868a90>
#模型的測(cè)試誤差指標(biāo)
print(model.metrics_names)
# 對(duì)測(cè)試數(shù)據(jù)進(jìn)行測(cè)試
model.evaluate(x_test_1, y_test_1,
verbose=1,
batch_size=100)
['loss', 'acc']
9800/10000 [============================>.] - ETA: 0s
[0.036339853547979147, 0.98736999988555907]
根據(jù)上面冀墨,我們可以看出闸衫,深度越深,效果越好诽嘉。
但是蔚出,對(duì)于深層網(wǎng)絡(luò),sigmoid函數(shù)反向傳播時(shí)虫腋,很容易就會(huì)出現(xiàn)梯度消失的情況從而無法完成深層網(wǎng)絡(luò)的訓(xùn)練骄酗。在sigmoid接近飽和區(qū)時(shí),變換非常緩慢悦冀,導(dǎo)數(shù)趨于0趋翻,減緩收斂速度。
5. ReLu
ReLu來自于對(duì)人腦神經(jīng)細(xì)胞工作時(shí)的稀疏性的研究盒蟆,在 Lennie,P.(2003)提出人腦神經(jīng)元有95%-99%是閑置的踏烙,而更少工作的神經(jīng)元意味著更小的計(jì)算復(fù)雜度,更不容易過擬合
修正線性單元(Rectified linear unit,ReLU)公式:
其圖像:
ReLU具有線性历等、非飽和性讨惩,而其非飽和性使得網(wǎng)絡(luò)可以自行引入稀疏性。
ReLU的使用解決了sigmoid梯度下降慢募闲,深層網(wǎng)絡(luò)的信息丟失的問題步脓。
ReLU在訓(xùn)練時(shí)是非常脆弱的,并且可能會(huì)“死”浩螺。例如,經(jīng)過ReLU神經(jīng)元的一個(gè)大梯度可能導(dǎo)致權(quán)重更新后該神經(jīng)元接收到任何數(shù)據(jù)點(diǎn)都不會(huì)再激活仍侥。如果發(fā)生這種情況要出,之后通過該單位點(diǎn)的梯度將永遠(yuǎn)是零。也就是說农渊,ReLU可能會(huì)在訓(xùn)練過程中不可逆地死亡患蹂,并且破壞數(shù)據(jù)流形。如果學(xué)習(xí)率太高砸紊,大部分網(wǎng)絡(luò)將會(huì)“死亡”(即传于,在整個(gè)訓(xùn)練過程中神經(jīng)元都沒有激活)。而設(shè)置一個(gè)適當(dāng)?shù)膶W(xué)習(xí)率醉顽,可以在一定程度上避免這一問題沼溜。
6. 學(xué)習(xí)速率
上面說梯度下降的時(shí)候,說過學(xué)習(xí)速率其實(shí)就是梯度下降的步伐游添。因此系草,為了到達(dá)山谷通熄,需要控制步伐的大小,即學(xué)習(xí)速率找都。
學(xué)習(xí)速率大小的調(diào)節(jié)一般取決于 loss 的變化幅度唇辨。
# neural network with 5 layers
#
# · · · · · · · · · · (input data, flattened pixels) X [batch, 784] # 784 = 28*28
# \x/x\x/x\x/x\x/x\x/ -- fully connected layer (relu) W1 [784, 200] B1[200]
# · · · · · · · · · Y1 [batch, 200]
# \x/x\x/x\x/x\x/ -- fully connected layer (relu) W2 [200, 100] B2[100]
# · · · · · · · Y2 [batch, 100]
# \x/x\x/x\x/ -- fully connected layer (relu) W3 [100, 60] B3[60]
# · · · · · Y3 [batch, 60]
# \x/x\x/ -- fully connected layer (relu) W4 [60, 30] B4[30]
# · · · Y4 [batch, 30]
# \x/ -- fully connected layer (softmax) W5 [30, 10] B5[10]
# · Y5 [batch, 10]
model = Sequential()
model.add(Dense(200, input_shape=(784,)))#全連接,輸入784維度, 輸出10維度能耻,需要和輸入輸出對(duì)應(yīng)
model.add(Activation('relu'))# 將激活函數(shù)sigmoid改為ReLU
model.add(Dense(100))
model.add(Activation('relu'))
model.add(Dense(60))
model.add(Activation('relu'))
model.add(Dense(30))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('softmax'))
sgd = Adam(lr=0.001)
model.compile(loss='binary_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
#model 概要
model.summary()
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
dense_16 (Dense) (None, 200) 157000 dense_input_4[0][0]
____________________________________________________________________________________________________
activation_16 (Activation) (None, 200) 0 dense_16[0][0]
____________________________________________________________________________________________________
dense_17 (Dense) (None, 100) 20100 activation_16[0][0]
____________________________________________________________________________________________________
activation_17 (Activation) (None, 100) 0 dense_17[0][0]
____________________________________________________________________________________________________
dense_18 (Dense) (None, 60) 6060 activation_17[0][0]
____________________________________________________________________________________________________
activation_18 (Activation) (None, 60) 0 dense_18[0][0]
____________________________________________________________________________________________________
dense_19 (Dense) (None, 30) 1830 activation_18[0][0]
____________________________________________________________________________________________________
activation_19 (Activation) (None, 30) 0 dense_19[0][0]
____________________________________________________________________________________________________
dense_20 (Dense) (None, 10) 310 activation_19[0][0]
____________________________________________________________________________________________________
activation_20 (Activation) (None, 10) 0 dense_20[0][0]
====================================================================================================
Total params: 185,300
Trainable params: 185,300
Non-trainable params: 0
____________________________________________________________________________________________________
SVG(model_to_dot(model).create(prog='dot', format='svg'))
tensorboard3 = TensorBoard(log_dir='/home/tensorflow/log/five_layer_relu/epoch', histogram_freq=0)
my_tensorboard3 = BatchTensorBoard(log_dir='/home/tensorflow/log/five_layer_relu/batch')
model.fit(x_train_1, y_train_1,
nb_epoch=30,
verbose=0,
batch_size=100,
callbacks=[my_tensorboard3, tensorboard3])
<keras.callbacks.History at 0xe3c6d50>
#模型的測(cè)試誤差指標(biāo)
print(model.metrics_names)
# 對(duì)測(cè)試數(shù)據(jù)進(jìn)行測(cè)試
model.evaluate(x_test_1, y_test_1,
verbose=1,
batch_size=100)
['loss', 'acc']
9600/10000 [===========================>..] - ETA: 0s
[0.017244604945910281, 0.99598000288009647]
7.Dropout
運(yùn)行目錄下的mnist_2.1_five_layers_relu_lrdecay.py
隨著迭代次數(shù)的增加赏枚,我們可以發(fā)現(xiàn)測(cè)試數(shù)據(jù)的loss值和訓(xùn)練數(shù)據(jù)的loss存在著巨大的差距, 隨著迭代次數(shù)增加晓猛,train loss 越來越好饿幅,但test loss 的結(jié)果確越來越差,test loss 和 train loss 差距越來越大鞍帝,模型開始過擬合诫睬。
Dropout是指對(duì)于神經(jīng)網(wǎng)絡(luò)單元按照一定的概率將其暫時(shí)從網(wǎng)絡(luò)中丟棄,從而解決過擬合問題。
可以對(duì)比mnist_2.1_five_layers_relu_lrdecay.py 和 加了dropout的/mnist_2.2_five_layers_relu_lrdecay_dropout.py的結(jié)果
# neural network with 5 layers
#
# · · · · · · · · · · (input data, flattened pixels) X [batch, 784] # 784 = 28*28
# \x/x\x/x\x/x\x/x\x/ ? -- fully connected layer (relu+dropout) W1 [784, 200] B1[200]
# · · · · · · · · · Y1 [batch, 200]
# \x/x\x/x\x/x\x/ ? -- fully connected layer (relu+dropout) W2 [200, 100] B2[100]
# · · · · · · · Y2 [batch, 100]
# \x/x\x/x\x/ ? -- fully connected layer (relu+dropout) W3 [100, 60] B3[60]
# · · · · · Y3 [batch, 60]
# \x/x\x/ ? -- fully connected layer (relu+dropout) W4 [60, 30] B4[30]
# · · · Y4 [batch, 30]
# \x/ -- fully connected layer (softmax) W5 [30, 10] B5[10]
# · Y5 [batch, 10]
model = Sequential()
model.add(Dense(200, input_shape=(784,)))#全連接帕涌,輸入784維度, 輸出10維度摄凡,需要和輸入輸出對(duì)應(yīng)
model.add(Activation('relu'))# 將激活函數(shù)sigmoid改為ReLU
model.add(Dense(100))
model.add(Activation('relu'))
model.add(Dropout(0.25))# 添加一個(gè)dropout層, 隨機(jī)移除25%的單元
model.add(Dense(60))
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(Dense(30))
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(Dense(10))
model.add(Activation('softmax'))
sgd = Adam(lr=0.001)
model.compile(loss='binary_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
#model 概要
model.summary()
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
dense_171 (Dense) (None, 200) 157000 dense_input_35[0][0]
____________________________________________________________________________________________________
activation_171 (Activation) (None, 200) 0 dense_171[0][0]
____________________________________________________________________________________________________
dense_172 (Dense) (None, 100) 20100 activation_171[0][0]
____________________________________________________________________________________________________
activation_172 (Activation) (None, 100) 0 dense_172[0][0]
____________________________________________________________________________________________________
dropout_100 (Dropout) (None, 100) 0 activation_172[0][0]
____________________________________________________________________________________________________
dense_173 (Dense) (None, 60) 6060 dropout_100[0][0]
____________________________________________________________________________________________________
activation_173 (Activation) (None, 60) 0 dense_173[0][0]
____________________________________________________________________________________________________
dropout_101 (Dropout) (None, 60) 0 activation_173[0][0]
____________________________________________________________________________________________________
dense_174 (Dense) (None, 30) 1830 dropout_101[0][0]
____________________________________________________________________________________________________
activation_174 (Activation) (None, 30) 0 dense_174[0][0]
____________________________________________________________________________________________________
dropout_102 (Dropout) (None, 30) 0 activation_174[0][0]
____________________________________________________________________________________________________
dense_175 (Dense) (None, 10) 310 dropout_102[0][0]
____________________________________________________________________________________________________
activation_175 (Activation) (None, 10) 0 dense_175[0][0]
====================================================================================================
Total params: 185,300
Trainable params: 185,300
Non-trainable params: 0
____________________________________________________________________________________________________
SVG(model_to_dot(model).create(prog='dot', format='svg'))
tensorboard4 = TensorBoard(log_dir='/home/tensorflow/log/five_layer_relu_dropout/epoch')
my_tensorboard4 = BatchTensorBoard(log_dir='/home/tensorflow/log/five_layer_relu_dropout/batch')
model.fit(x_train_1, y_train_1,
nb_epoch=30,
verbose=0,
batch_size=100,
callbacks=[tensorboard4, my_tensorboard4])
<keras.callbacks.History at 0x27819610>
#模型的測(cè)試誤差指標(biāo)
print(model.metrics_names)
# 對(duì)測(cè)試數(shù)據(jù)進(jìn)行測(cè)試
model.evaluate(x_test_1, y_test_1,
verbose=1,
batch_size=100)
['loss', 'acc']
9900/10000 [============================>.] - ETA: 0s
[0.025450729207368569, 0.99462999999523161]