前面三期都是在學(xué)習(xí)和理解磺浙,如果熟悉了頭兩期,這一期的實(shí)踐將變得異常簡(jiǎn)單毫玖!
本期目標(biāo)
- 了解BN的實(shí)現(xiàn)
- 觀察BN的效果
- 知道BN在train和inference兩個(gè)階段的實(shí)現(xiàn)細(xì)節(jié)
系列目錄
理解Batch Normalization系列1——原理
理解Batch Normalization系列2——訓(xùn)練及評(píng)估
理解Batch Normalization系列3——為什么有效及若干討論
理解Batch Normalization系列4——實(shí)踐
文章目錄
構(gòu)造網(wǎng)絡(luò)
訓(xùn)練網(wǎng)絡(luò)
對(duì)學(xué)習(xí)曲線的影響
對(duì)加權(quán)和的影響
測(cè)試階段的問題
實(shí)際應(yīng)用的細(xì)節(jié)
其他可以做的試驗(yàn)
基于tensorflow1.5.0蛋勺,python3.6.8,試驗(yàn)數(shù)據(jù)集為mnist
import numpy as np, tensorflow as tf, tqdm
from tensorflow.examples.tutorials.mnist import input_data
import matplotlib.pyplot as plt
%matplotlib inline
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
print(tf.__version__)
import platform
print(platform.python_version())
Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz
1.5.0
3.6.8
構(gòu)造網(wǎng)絡(luò)
構(gòu)建兩個(gè)全連接神經(jīng)網(wǎng)絡(luò):
- 一個(gè)是普通網(wǎng)絡(luò)烁登,包括2個(gè)隱層,1個(gè)輸出層蔚舀。
- 一個(gè)是有BN的網(wǎng)絡(luò)饵沧,包括2個(gè)隱層,1個(gè)輸出層赌躺。
第1層中的BN是我們自定義的狼牺,第2層和第3層中的BN是調(diào)用tensorflow實(shí)現(xiàn)。
定義輸入占位符礼患,定義三個(gè)層的權(quán)重是钥,方便后面使用
w1_initial = np.random.normal(size=(784,100)).astype(np.float32)
w2_initial = np.random.normal(size=(100,100)).astype(np.float32)
w3_initial = np.random.normal(size=(100,10)).astype(np.float32)
# 為BN層準(zhǔn)備一個(gè)非常小的數(shù)字,防止出現(xiàn)分母為0的極端情況缅叠。
epsilon = 1e-3
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
Layer 1 層:無BN
w1 = tf.Variable(w1_initial)
b1 = tf.Variable(tf.zeros([100]))
z1 = tf.matmul(x,w1)+b1
l1 = tf.nn.sigmoid(z1)
Layer 1 層:有BN(自定義BN層)
w1_BN = tf.Variable(w1_initial)
# 因?yàn)锽N的引入悄泥,b的作用被BN層替代,省略痪署。
z1_BN = tf.matmul(x,w1_BN)
# 計(jì)算加權(quán)和的均值和方差码泞,0是指batch這個(gè)維度
batch_mean1, batch_var1 = tf.nn.moments(z1_BN,[0])
# 正則化
z1_hat = (z1_BN - batch_mean1) / tf.sqrt(batch_var1 + epsilon)
# 新建兩個(gè)變量scale and beta
scale1 = tf.Variable(tf.ones([100]))
beta1 = tf.Variable(tf.zeros([100]))
# 計(jì)算被還原的BN1,即BN文章里的y
BN1 = scale1 * z1_hat + beta1
# l1_BN = tf.nn.sigmoid(BN1)
l1_BN = tf.nn.relu(BN1)
Layer 2 層:無BN
w2 = tf.Variable(w2_initial)
b2 = tf.Variable(tf.zeros([100]))
z2 = tf.matmul(l1,w2)+b2
# l2 = tf.nn.sigmoid(z2)
l2 = tf.nn.relu(z2)
Layer 2 層:有BN(使用tensorflow創(chuàng)建BN層)
w2_BN = tf.Variable(w2_initial)
z2_BN = tf.matmul(l1_BN,w2_BN)
# 計(jì)算加權(quán)和的均值和方差狼犯,0是指batch這個(gè)維度
batch_mean2, batch_var2 = tf.nn.moments(z2_BN,[0])
# 新建兩個(gè)變量scale and beta
scale2 = tf.Variable(tf.ones([100]))
beta2 = tf.Variable(tf.zeros([100]))
# 計(jì)算被還原的BN2余寥,即BN文章里的y。使用
BN2 = tf.nn.batch_normalization(z2_BN,batch_mean2,batch_var2,beta2,scale2,epsilon)
# l2_BN = tf.nn.sigmoid(BN2)
l2_BN = tf.nn.relu(BN2)
Layer 3 層:無BN
w3 = tf.Variable(w3_initial)
b3 = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(l2,w3)+b3)
Layer 3 層:有BN(使用tensorflow創(chuàng)建BN層)
# w3_BN = tf.Variable(w3_initial)
# b3_BN = tf.Variable(tf.zeros([10]))
# y_BN = tf.nn.softmax(tf.matmul(l2_BN,w3_BN)+b3_BN)
w3_BN = tf.Variable(w3_initial)
z3_BN = tf.matmul(l2_BN,w3_BN)
batch_mean3, batch_var3 = tf.nn.moments(z3_BN,[0])
scale3 = tf.Variable(tf.ones([10]))
beta3 = tf.Variable(tf.zeros([10]))
BN3 = tf.nn.batch_normalization(z3_BN,batch_mean3,batch_var3,beta3,scale3,epsilon)
# print(BN3.get_shape())
y_BN = tf.nn.softmax(BN3)
針對(duì)普通網(wǎng)絡(luò)和BN網(wǎng)絡(luò)悯森,分別定義損失宋舷、優(yōu)化器、精度三個(gè)op瓢姻。
- 損失使用交叉熵祝蝠,因?yàn)槲覀冚敵鰧拥募せ詈瘮?shù)為softmax。
- 優(yōu)化器用梯度下降
# 普通網(wǎng)絡(luò)的損失
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
# BN網(wǎng)絡(luò)的損失
cross_entropy_BN = -tf.reduce_sum(y_*tf.log(y_BN))
# 普通網(wǎng)絡(luò)的優(yōu)化器
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
# BN網(wǎng)絡(luò)的優(yōu)化器
train_step_BN = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy_BN)
# 普通網(wǎng)絡(luò)的accuracy
correct_prediction = tf.equal(tf.arg_max(y,1),tf.arg_max(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
# BN網(wǎng)絡(luò)的accuracy_BN
correct_prediction_BN = tf.equal(tf.arg_max(y_BN,1),tf.arg_max(y_,1))
accuracy_BN = tf.reduce_mean(tf.cast(correct_prediction_BN,tf.float32))
WARNING:tensorflow:From <ipython-input-9-175577f60212>:12: arg_max (from tensorflow.python.ops.gen_math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `argmax` instead
訓(xùn)練網(wǎng)絡(luò)
- 訓(xùn)練普通網(wǎng)絡(luò)幻碱、有BN網(wǎng)絡(luò)绎狭,
- 對(duì)比訓(xùn)練階段的學(xué)習(xí)曲線。
- 對(duì)比BN對(duì)輸入加權(quán)和的影響褥傍。
首先是訓(xùn)練普通網(wǎng)絡(luò)儡嘶、有BN網(wǎng)絡(luò)
# zs存放普通網(wǎng)絡(luò)第2個(gè)隱層的非線性激活前的加權(quán)和向量。
# BNs存放BN網(wǎng)絡(luò)第2個(gè)隱層的非線性激活前的加權(quán)和向量(經(jīng)過了BN)恍风。
# acc, acc_BN存放訓(xùn)練階段蹦狂,通過測(cè)試集測(cè)得的精度序列誓篱。用于繪制learning curve
zs, BNs, acc, acc_BN = [], [], [], []
# 開一個(gè)sess,同時(shí)跑train_step凯楔、train_step_BN
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
for i in tqdm.tqdm(range(40000)):
batch = mnist.train.next_batch(60)
# 運(yùn)行train_step窜骄,訓(xùn)練無BN網(wǎng)絡(luò)
train_step.run(feed_dict={x: batch[0], y_: batch[1]})
# 運(yùn)行train_step_BN,訓(xùn)練有BN的網(wǎng)絡(luò)
train_step_BN.run(feed_dict={x: batch[0], y_: batch[1]})
if i % 50 is 0:
# 每50個(gè)batch摆屯,測(cè)一測(cè)精度邻遏,并把第二層的加權(quán)輸入,沒進(jìn)BN層之前的z2鸥拧,被BN層處理過的BN2党远,都算一遍
res = sess.run([accuracy,accuracy_BN,z2,BN2],feed_dict={x: mnist.test.images, y_: mnist.test.labels})
# 保存訓(xùn)練階段的精度記錄,acc是無BN網(wǎng)絡(luò)的記錄富弦,acc_BN是有BN網(wǎng)絡(luò)的記錄,
acc.append(res[0])
acc_BN.append(res[1])
# 保存訓(xùn)練階段的z2氛驮,BN2的歷史記錄
zs.append(np.mean(res[2],axis=0))
BNs.append(np.mean(res[3],axis=0))
zs, BNs, acc, acc_BN = np.array(zs), np.array(BNs), np.array(acc), np.array(acc_BN)
100%|████████████████████████████████████| 40000/40000 [06:54<00:00, 96.44it/s]
對(duì)學(xué)習(xí)曲線的影響
對(duì)比訓(xùn)練階段的學(xué)習(xí)曲線:
繪制精度訓(xùn)練曲線的結(jié)果表明BN的加入腕柜,大大提升了訓(xùn)練效率。
fig, ax = plt.subplots()
ax.plot(range(0,len(acc)*50,50),acc, label='Without BN')
ax.plot(range(0,len(acc)*50,50),acc_BN, label='With BN')
ax.set_xlabel('Training steps')
ax.set_ylabel('Accuracy')
# ax.set_ylim([0.8,1])
ax.set_title('Batch Normalization Accuracy')
ax.legend(loc=4)
plt.show()
對(duì)加權(quán)和的影響
zs 來自無BN網(wǎng)絡(luò)的第二個(gè)隱層的輸入加權(quán)和向量矫废,即下一步將喂給本層的激活函數(shù)盏缤。
BNs 來自有BN網(wǎng)絡(luò)的第二個(gè)隱層的輸入加權(quán)和經(jīng)過BN層處理后的向量,即下一步也將喂給本層的激活函數(shù)蓖扑。
- 效果:沒有BN唉铜,則網(wǎng)絡(luò)的加權(quán)和完全跑飛了;有BN律杠,則加權(quán)和會(huì)被約束在0附近潭流。
# 顯示在無BN和有BN兩個(gè)網(wǎng)絡(luò)里,800次前向傳播中第2個(gè)隱層的5個(gè)神經(jīng)元的加權(quán)和的輸入范圍柜去。
fig, axes = plt.subplots(5, 2, figsize=(6,12))
# fig, axes = plt.subplots(5, 2)
fig.tight_layout()
for i, ax in enumerate(axes):
ax[0].set_title("Without BN")
ax[1].set_title("With BN")
# [:,i]表示取其中一列灰嫉,也就是對(duì)應(yīng)神經(jīng)網(wǎng)絡(luò)中
# print(zs[:,i].shape)
ax[0].plot(zs[:,i])
ax[1].plot(BNs[:,i])
plt.show()
測(cè)試階段的問題
帶有BN的網(wǎng)絡(luò),不能直接用于測(cè)試嗓奢。
因?yàn)闇y(cè)試階段每個(gè)樣本如果是逐個(gè)輸入讼撒,相當(dāng)于batch_size=1,那么均值為自己,方差為0股耽,正則化后將為0根盒。
導(dǎo)致模型的輸入永遠(yuǎn)是一個(gè)0值。因此預(yù)測(cè)將根據(jù)訓(xùn)練的權(quán)重輸出一個(gè)大概率是錯(cuò)誤的預(yù)測(cè)物蝙。
predictions = []
correct = 0
for i in range(100):
pred, corr = sess.run([tf.arg_max(y_BN,1), accuracy_BN],
feed_dict={x: [mnist.test.images[i]], y_: [mnist.test.labels[i]]})
# 累加炎滞,最終用于求取100次預(yù)測(cè)的平均精度
correct += corr
# 保存每次預(yù)測(cè)的結(jié)果
predictions.append(pred[0])
print("PREDICTIONS:", predictions)
print("ACCURACY:", correct/100)
sess.close()
# 結(jié)果將是:不管輸入的是什么照片,結(jié)果都將相同茬末。因?yàn)槊總€(gè)圖片在僅有自己的mini-batch上都被標(biāo)準(zhǔn)化為了全0向量厂榛。
PREDICTIONS: [8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8]
ACCURACY: 0.02
實(shí)際應(yīng)用的細(xì)節(jié)
為了避免推理(or預(yù)測(cè))時(shí)出現(xiàn)的問題盖矫,需要注意一下幾點(diǎn):
構(gòu)造BN層
batch_norm_wrapper將實(shí)現(xiàn)在更為高級(jí)的功能,合二為一:
- 訓(xùn)練階段击奶,統(tǒng)計(jì)訓(xùn)練集均值和方差辈双;
- 推理階段,直接使用訓(xùn)練階段的統(tǒng)計(jì)結(jié)果柜砾。
# batch_norm_wrapper 是對(duì)tensorflow中BN層實(shí)現(xiàn)的一個(gè)核心功能的重現(xiàn)湃望。
# https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/layers/python/layers/layers.py#L102
# 其功能是:對(duì)于每個(gè)batch的每一層的加權(quán)輸入
# 在訓(xùn)練階段,統(tǒng)計(jì)方差和均值痰驱,一邊記錄和更新總體方差和均值证芭。
# 在測(cè)試/評(píng)估階段,直接使用訓(xùn)練時(shí)統(tǒng)計(jì)好的總體方差和均值担映。
def batch_norm_wrapper(inputs, is_training, decay = 0.999):
# 每個(gè)BN層废士,引入了4個(gè)變量 scale beta pop_mean pop_var,其中:
# scale beta 是可訓(xùn)練的蝇完,訓(xùn)練結(jié)束后被保存為模型參數(shù)
# pop_mean pop_var 是不可訓(xùn)練官硝,只在訓(xùn)練中進(jìn)行統(tǒng)計(jì),
# pop_mean pop_var 最終保存為模型的變量短蜕。在測(cè)試時(shí)重構(gòu)的計(jì)算圖會(huì)接入該變量氢架,只要載入訓(xùn)練參數(shù)即可。
scale = tf.Variable(tf.ones([inputs.get_shape()[-1]]))
beta = tf.Variable(tf.zeros([inputs.get_shape()[-1]]))
pop_mean = tf.Variable(tf.zeros([inputs.get_shape()[-1]]), trainable=False)
pop_var = tf.Variable(tf.ones([inputs.get_shape()[-1]]), trainable=False)
if is_training:
# 以下為訓(xùn)練時(shí)的BN計(jì)算圖構(gòu)造
# batch_mean朋魔、batch_var在一個(gè)batch里的每一層岖研,在前向傳播時(shí)會(huì)計(jì)算一次,
# 在反傳時(shí)通過它來計(jì)算本層輸入加權(quán)和的梯度警检,僅僅作為整個(gè)網(wǎng)絡(luò)傳遞梯度的功能孙援。在訓(xùn)練結(jié)束后被廢棄。
batch_mean, batch_var = tf.nn.moments(inputs,[0])
# 通過移動(dòng)指數(shù)平均的方式解滓,把每一個(gè)batch的統(tǒng)計(jì)量匯總進(jìn)來赃磨,更新總體統(tǒng)計(jì)量的估計(jì)值pop_mean、pop_var
# assign構(gòu)建計(jì)算圖一個(gè)operation洼裤,即把pop_mean * decay + batch_mean * (1 - decay) 賦值給pop_mean
train_mean = tf.assign(pop_mean,pop_mean * decay + batch_mean * (1 - decay))
train_var = tf.assign(pop_var,pop_var * decay + batch_var * (1 - decay))
# 確保本層的train_mean邻辉、train_var這兩個(gè)operation都執(zhí)行了,才進(jìn)行BN腮鞍。
with tf.control_dependencies([train_mean, train_var]):
return tf.nn.batch_normalization(inputs,batch_mean, batch_var, beta, scale, epsilon)
else:
# 以下為測(cè)試時(shí)的BN計(jì)算圖構(gòu)造值骇,即直接載入已訓(xùn)練模型的beta, scale,已訓(xùn)練模型中保存的pop_mean, pop_var
return tf.nn.batch_normalization(inputs,pop_mean, pop_var, beta, scale, epsilon)
構(gòu)造計(jì)算圖
其中通過調(diào)用上面定義好的BN包裝器,實(shí)現(xiàn)BN層的簡(jiǎn)潔添加移国。
def build_graph(is_training):
x = tf.placeholder(tf.float32, shape=[None, 784],name="x")
y_ = tf.placeholder(tf.float32, shape=[None, 10],name="y_")
w1 = tf.Variable(w1_initial)
z1 = tf.matmul(x,w1)
bn1 = batch_norm_wrapper(z1, is_training)
l1 = tf.nn.sigmoid(bn1)
w2 = tf.Variable(w2_initial)
z2 = tf.matmul(l1,w2)
bn2 = batch_norm_wrapper(z2, is_training)
l2 = tf.nn.sigmoid(bn2)
w3 = tf.Variable(w3_initial)
# b3 = tf.Variable(tf.zeros([10]))
# y = tf.nn.softmax(tf.matmul(l2, w3))
z3 = tf.matmul(l2,w3)
bn3 = batch_norm_wrapper(z3, is_training)
y = tf.nn.softmax(bn3)
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
correct_prediction = tf.equal(tf.arg_max(y,1),tf.arg_max(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32),name='accuracy')
return (x, y_), train_step, accuracy, y
訓(xùn)練階段
- 通過傳入is_training=True吱瘩,開啟計(jì)算圖的方差和均值統(tǒng)計(jì)操作。
- 訓(xùn)練結(jié)束后迹缀,保存模型使碾,包含計(jì)算圖和參數(shù)蜜徽,實(shí)際上只有參數(shù)會(huì)被用到,因?yàn)樵陬A(yù)測(cè)時(shí)會(huì)新建計(jì)算圖票摇。
tf.reset_default_graph()
(x, y_), train_step, accuracy, _,= build_graph(is_training=True)
acc = []
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in tqdm.tqdm(range(10000)):
batch = mnist.train.next_batch(60)
train_step.run(feed_dict={x: batch[0], y_: batch[1]})
if i % 200 is 0:
res = sess.run([accuracy],feed_dict={x: mnist.test.images, y_: mnist.test.labels})
acc.append(res[0])
# print('batch:',i,' accuracy:',res[0])
# 保存模型拘鞋,注意該模型是不可用的。因?yàn)槠溆?jì)算圖是訓(xùn)練的計(jì)算圖矢门。
saver=tf.train.Saver()
# saved_model = saver.save(sess, './temp-bn-save')
saver.save(sess, './bn_test/temp-bn-save')
writer=tf.summary.FileWriter('./improved_graph2',sess.graph)
writer.flush()
writer.close()
print("Final accuracy:", acc[-1])
100%|███████████████████████████████████| 10000/10000 [00:38<00:00, 260.31it/s]
Final accuracy: 0.9538
測(cè)試階段
先構(gòu)造推理的計(jì)算圖盆色,再把訓(xùn)練好的模型參數(shù)載入到這個(gè)計(jì)算圖中。
tf.reset_default_graph()
# (x, y_), _, accuracy, y, saver = build_graph(is_training=False)
(x, y_), _, accuracy, y = build_graph(is_training=False)
predictions = []
correct = 0
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# 讀取訓(xùn)練時(shí)模型祟剔,將學(xué)到的權(quán)重參數(shù)隔躲、估計(jì)的總體均值、方差物延,通過restore宣旱,載入到運(yùn)行的計(jì)算圖中。
saver=tf.train.Saver()
saver.restore(sess, './bn_test/temp-bn-save')
saver.save(sess, './bn_release/temp-bn-save')
for i in range(100):
pred, corr = sess.run([tf.arg_max(y,1), accuracy],
feed_dict={x: [mnist.test.images[i]], y_: [mnist.test.labels[i]]})
correct += corr
predictions.append(pred[0])
print("PREDICTIONS:", predictions)
print("ACCURACY:", correct/100)
INFO:tensorflow:Restoring parameters from ./bn_test/temp-bn-save
PREDICTIONS: [7, 2, 1, 0, 4, 1, 4, 9, 6, 9, 0, 6, 9, 0, 1, 5, 9, 7, 3, 4, 9, 6, 6, 5, 4, 0, 7, 4, 0, 1, 3, 1, 3, 4, 7, 2, 7, 1, 2, 1, 1, 7, 4, 2, 3, 5, 1, 2, 4, 4, 6, 3, 5, 5, 6, 0, 4, 1, 9, 7, 7, 8, 9, 3, 7, 4, 1, 4, 3, 0, 7, 0, 2, 9, 1, 7, 3, 2, 9, 7, 7, 6, 2, 7, 8, 4, 7, 3, 6, 1, 3, 6, 9, 3, 1, 4, 1, 7, 6, 9]
ACCURACY: 0.97
其他可以做的試驗(yàn)
如果感興趣叛薯,不妨基于上面的代碼繼續(xù)進(jìn)行試驗(yàn)
BN到添加位置試驗(yàn)
在加權(quán)和后 vs 在加權(quán)和前
靠近輸入層 vs 靠近輸出層
BN的添加量試驗(yàn)
在每一層都加BN vs 在少數(shù)幾層加BN
BN對(duì)學(xué)習(xí)率的影響
大學(xué)習(xí)率 vs 小學(xué)習(xí)率
Batch_size對(duì)BN效果的影響
小batch_size vs 大batch_size