更好的閱讀體驗(yàn)苇羡,請(qǐng)點(diǎn)擊這里
內(nèi)容概要:
- Parameters vs Hyperparameters
- 為什么用深度神經(jīng)網(wǎng)絡(luò)(why deep presentation)
- 逐步實(shí)現(xiàn)深度神經(jīng)網(wǎng)絡(luò)
- 確定各層參數(shù)矩陣維度 & 隨機(jī)初始化
- 使用tensorflow實(shí)現(xiàn)深度神經(jīng)網(wǎng)絡(luò)
0 - 符號(hào)約定
1 - 參數(shù)(Parameters) & 超參 (HyperParameters)
參數(shù)是模型內(nèi)部的配置變量兴猩,可以通過(guò)訓(xùn)練數(shù)據(jù)估計(jì)參數(shù)的值;超參是模型外部的配置啤握,必須手動(dòng)設(shè)置參數(shù)的值盒延。由于超參的設(shè)置對(duì)參數(shù)會(huì)產(chǎn)生非常大的影響,因此,超參可以理解成控制參數(shù)的參數(shù)秘血。事實(shí)上即彪,使用不同的超參配置進(jìn)行神經(jīng)網(wǎng)絡(luò)訓(xùn)練紧唱,得到的結(jié)果可能會(huì)大相徑庭。
常見的參數(shù)有:W隶校,b等
常見的超參有:learning rate漏益,iterations number, hidden layer number深胳,hidden unit number等绰疤。
2 - 為什么用深度神經(jīng)網(wǎng)絡(luò)(why deep presentation)
Informally:There are functions you can compute with a "small" L-layer deep neural network that shallower networks require exponentially more hidden units to compute.
上面這句話的意思就是,在神經(jīng)網(wǎng)絡(luò)中舞终,針對(duì)復(fù)雜函數(shù)的逼近轻庆,增加層數(shù)會(huì)比增加神經(jīng)元更有效。
直觀上理解敛劝,增加層數(shù)能夠表達(dá)更多的信息余爆,而增加單層的神經(jīng)元數(shù)量,對(duì)于信息的表達(dá)提升能力有限攘蔽。以CNN(卷積神經(jīng)網(wǎng)絡(luò))進(jìn)行圖片識(shí)別任務(wù)為例龙屉,以圖片識(shí)別為例,針對(duì)目標(biāo)的識(shí)別層層遞進(jìn)满俗,第一個(gè)隱含層檢測(cè)邊緣转捕,第二個(gè)隱含層檢測(cè)目標(biāo)的組件,第三個(gè)隱含層就可以看到目標(biāo)的輪廓了唆垃。如果在第一個(gè)隱含層增加神經(jīng)元數(shù)量五芝,僅僅是增加了邊緣檢測(cè)的正確率,依然不能檢測(cè)圖片中的組件辕万。
3 - 逐步實(shí)現(xiàn)深度神經(jīng)網(wǎng)絡(luò)
假設(shè)需要實(shí)現(xiàn)的深度神經(jīng)網(wǎng)絡(luò)層數(shù)為$L$(隱含層數(shù)量為$L-1$)枢步,首先,隨機(jī)初始化各層的參數(shù)渐尿;其次是向前傳播過(guò)程醉途,這里要注意的是,隱含層要使用ReLU激活函數(shù)砖茸,防止深度增加造成梯度消失隘擎;接著計(jì)算loss;然后是向后傳播過(guò)程凉夯,這個(gè)過(guò)程的激活函數(shù)要和向前傳播過(guò)程一致货葬;接著由梯度下降法更新參數(shù)采幌;【向前傳播】-【計(jì)算loss】-【向后傳播】-【更新參數(shù)】是一個(gè)循環(huán),需要迭代計(jì)算num_iterations次震桶;最后是用訓(xùn)練好的模型在測(cè)試集上進(jìn)行預(yù)測(cè)休傍。
其實(shí)現(xiàn)框架圖如下:
前向傳播過(guò)程的一種簡(jiǎn)介表示方法: [LINEAR -> RELU] *(L-1) -> LINEAR -> SIGMOID
因此,使用Python實(shí)現(xiàn)上圖所示的深度神經(jīng)網(wǎng)絡(luò)框架蹲姐,需要完成6個(gè)部分的內(nèi)容:
- 隨機(jī)初始化$L$-layer neural network的參數(shù)
- 實(shí)現(xiàn)向前傳播(forward propagation)模塊(上圖紫色部分)
- 單層線性方程組計(jì)算磨取,結(jié)果保存為 $Z^{[l]}$
- 激活函數(shù) (relu/sigmoid)
- 將以上兩步組合成一個(gè)新的 [LINEAR->ACTIVATION]向前傳播函數(shù)
- 堆疊向前傳播函數(shù):[LINEAR->RELU] 重復(fù) L-1次,表示第1至L-1層柴墩,即所有隱含層寝衫;在后面加上一個(gè)[LINEAR->SIGMOID],表示第L層拐邪,也就是輸出層
- 計(jì)算loss
- 實(shí)現(xiàn)向后傳播(backward propagation)模塊(上圖紅色部分)
- 向后傳播過(guò)程主要是計(jì)算當(dāng)前參數(shù)的loss函數(shù)梯度
- 更新參數(shù)
- 預(yù)測(cè)函數(shù)
上面6個(gè)部分的python實(shí)現(xiàn)與之前寫的單隱層神經(jīng)網(wǎng)絡(luò)大同小異,值的注意的是各層參數(shù)矩陣的維度變化隘截。
Note: 以上摘自本周課程作業(yè)扎阶,完整的python實(shí)現(xiàn)過(guò)程,請(qǐng)查看課程作業(yè)婶芭!
4 - 確定各層參數(shù)矩陣維度 & 隨機(jī)初始化
與單隱層神經(jīng)網(wǎng)絡(luò)一樣东臀,深度神經(jīng)網(wǎng)絡(luò)的參數(shù)初始化也必須是隨機(jī)的,絕對(duì)不能將所有參數(shù)初始化為0犀农。通常惰赋,使用標(biāo)準(zhǔn)正態(tài)分布來(lái)初始化$W$參數(shù),$b$可以初始化為0呵哨。
對(duì)于深度神經(jīng)網(wǎng)絡(luò)而言赁濒,參數(shù)初始化的另外一個(gè)難點(diǎn)是確定每一層的參數(shù)矩陣維度,這里非常容易出錯(cuò)孟害。假設(shè)輸入數(shù)據(jù)集$X$的大小為$(12288, 209)$ (with $m=209$ examples)拒炎,那么各層參數(shù)矩陣維度可以用下表表示:
其中,$n^{[l]}$ 表示第 $l$ 層的神經(jīng)元數(shù)量挨务。仔細(xì)觀察上表击你,可以發(fā)現(xiàn),第l層的變量維度:
5 - TensorFlow實(shí)現(xiàn)深度神經(jīng)網(wǎng)絡(luò)
ng在本周課程編程作業(yè)中提供了cat數(shù)據(jù)集和以下用于識(shí)別圖片中是否有貓的深度神經(jīng)網(wǎng)絡(luò)模型谎柄。
下面用TensorFlow來(lái)實(shí)現(xiàn)丁侄,代碼如下:
import h5py
import numpy as np
import tensorflow as tf
# 讀入數(shù)據(jù)集
train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # train set features (209, 64, 64, 3)
train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # train set labels (209, 0)
test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # test set features (50, 64, 64, 3)
test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # test set labels (50, 0)
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
# Reshape the training and test examples
train_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T
# The "-1" makes reshape flatten the remaining dimensions
test_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T
# Standardize data to have feature values between 0 and 1.
train_x = train_x_flatten/255.
test_x = test_x_flatten/255.
# 啟用tensorboard
import os
if not os.path.exists('log'):
os.mkdir('log')
logdir = os.path.join(os.getcwd(), 'log')
print('啟用tensorboar,請(qǐng)復(fù)制以下代碼(port可以修改)到cmd中朝巫,執(zhí)行:\n',
'tensorboard --logdir=%s --port=6068' % logdir)
請(qǐng)復(fù)制以下代碼(port可以修改)到cmd中鸿摇,執(zhí)行,啟用tensorboar:
tensorboard --logdir=C:\Users\Mike\Documents\Blog\manuscripts\log --port=6068
# 定義網(wǎng)絡(luò)結(jié)構(gòu)
layers_dims = [12288, 20, 7, 5, 1]
# 輸入層(第一個(gè)數(shù)字)和輸出層(最后一個(gè)數(shù)字)不能修改捍歪,中間的隱含層可以修改户辱,
# 在中間增加數(shù)字就是增加隱含層鸵钝,數(shù)字表示隱含層神經(jīng)元數(shù)量
train_size = train_set_x_orig.shape[0]
L = len(layers_dims)
# 隨機(jī)初始化參數(shù)(單層)
def init_parameters(shape):
W = tf.get_variable('W', shape, initializer=tf.truncated_normal_initializer(stddev=0.1, seed=1))
b = tf.get_variable('b', shape=(shape[0],1) , initializer=tf.constant_initializer(0.0))
return W, b
# 前向傳播
def L_forward_propagation(X, L):
for i in range(1, L-1):
if i == 1:
A_pred = X
else:
A_pred = A
with tf.variable_scope('hidden_layer'+str(i)):
W, b = init_parameters(shape=(layers_dims[i], layers_dims[i-1]))
A = tf.nn.relu(tf.matmul(W, A_pred) + b)
# output layer
with tf.variable_scope('output_layer'):
W, b = init_parameters(shape=(layers_dims[L-1], layers_dims[L-2]))
Y = tf.matmul(W, A) + b # 這里輸出的是線性方程計(jì)算結(jié)果,并沒(méi)有經(jīng)過(guò)激活函數(shù)
return Y
# 輸入數(shù)據(jù)
with tf.name_scope('input_layer'):
X = tf.placeholder(tf.float32, shape=(layers_dims[0], train_size), name='X')
y_ = tf.placeholder(tf.float32, shape=(layers_dims[-1], train_size), name='Y_label')
y = L_forward_propagation(X, L)
# 計(jì)算損失函數(shù)的命名空間庐镐。
with tf.name_scope("loss_function"):
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(logits=y, labels=y_)
loss = tf.reduce_mean(cross_entropy)
# 訓(xùn)練模型
learning_rate = 0.05
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
# train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss)
# 寫入計(jì)算圖到log
summary_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())
summary_writer.close()
num_iterations = 1600
with tf.Session() as sess:
tf.global_variables_initializer().run()
for i in range(num_iterations):
_, total_loss, = sess.run([train_step, loss ], feed_dict={X: train_x, y_: train_set_y_orig})
if i % 100 == 0:
print("After %i training step(s), loss on training set is %s." % (i, str(total_loss)))
After 0 training step(s), loss on training set is 0.693251.
After 100 training step(s), loss on training set is 0.644346.
After 200 training step(s), loss on training set is 0.634704.
After 300 training step(s), loss on training set is 0.564144.
After 400 training step(s), loss on training set is 0.585186.
After 500 training step(s), loss on training set is 0.426819.
After 600 training step(s), loss on training set is 0.381561.
After 700 training step(s), loss on training set is 0.495546.
After 800 training step(s), loss on training set is 0.341763.
After 900 training step(s), loss on training set is 0.376317.
After 1000 training step(s), loss on training set is 0.663143.
After 1100 training step(s), loss on training set is 0.433534.
After 1200 training step(s), loss on training set is 0.183415.
After 1300 training step(s), loss on training set is 0.887958.
After 1400 training step(s), loss on training set is 0.163736.
After 1500 training step(s), loss on training set is 0.0112102.