configure
用于配置工程的超參數(shù), 對(duì)應(yīng)到01-svhn工程下的common.py.
minibatch_size = 128 # the number of instances in a batch
nr_channel = 3 # the channels of image
image_shape = (32, 32) # the image shape (height, width)
nr_class = 10 # the number of classes
nr_epoch = 60 # the max epoch of training
weight_decay = 1e-10 # a strength of regularization
test_interval = 5 # test in every ${test_interval} epochs
show_interval = 10 # print a message of training in every ${show_interval} minibatchs
nr_class根據(jù)分類任務(wù)的類別數(shù)進(jìn)行指定,
weight_decay指代的是 正則化部分對(duì)整個(gè)loss的影響程度
nr_class = 10 # the number of classes
weight_decay = 1e-10 # a strength of regularization
用于決定print的頻率, 單位都是epoch
test_interval = 5 # test in every ${test_interval} epochs
show_interval = 10 # print a message of training in every ${show_interval} minibatchs
Dataset
由于deep learning是data-driven, 所以數(shù)據(jù)的處理是整個(gè)工程當(dāng)中最重要的部分。01-svhn工程專門用一個(gè)dataset.py中的Dataset()進(jìn)行處理。
# import required modules
import os
import cv2
import numpy as np
from scipy import io as scio
class Dataset():
dataset_path = '../../dataset/SVHN' # a path saves dataset
dataset_meta = {
'train': ([os.path.join(dataset_path, 'train_32x32.mat')], 73257),
'test': ([os.path.join(dataset_path, 'test_32x32.mat')], 26032),
}
def __init__(self, dataset_name):
self.files, self.instances = self.dataset_meta[dataset_name]
def load(self):
'''Load dataset metas from files'''
datas_list, labels_list = [], []
for f in self.files:
samples = scio.loadmat(f)
datas_list.append(samples['X'])
labels_list.append(samples['y'])
self.samples = {
'X': np.concatenate(datas_list, axis=3), # datas
'Y': np.concatenate(labels_list, axis=0), # labels
}
return self
def instance_generator(self):
'''a generator to yield a sample'''
for i in range(self.instances):
img = self.samples['X'][:, :, :, i]
label = self.samples['Y'][i, :][0]
if label == 10:
label = 0
img = cv2.resize(img, image_shape)
yield img.astype(np.float32), np.array(label, dtype=np.int32)
@property
def instances_per_epoch(self):
return 25600 # set for a fast experiment
#return self.instances
@property
def minibatchs_per_epoch(self):
return 200 # set for a fast experimetn
#return self.instances // minibatch_size
Code explained
根據(jù)數(shù)據(jù)的保存路徑來修改dataset_path
dataset_path = '../../dataset/SVHN' # a path saves dataset
維護(hù)數(shù)據(jù)的各個(gè)子集, 包括它們的文件位置及數(shù)據(jù)量
dataset_meta = {
'train': ([os.path.join(dataset_path, 'train_32x32.mat')], 73257),
'test': ([os.path.join(dataset_path, 'test_32x32.mat')], 26032),
}
load函數(shù)作用一般就是解壓縮, 將數(shù)據(jù)集的數(shù)據(jù)load到內(nèi)存, 便于后續(xù)訪問。常用的也有把所有數(shù)據(jù)(eg. 圖片)的訪問路徑讀取成一個(gè)list, 便于后面讀取通熄。
def load(self):
'''Load dataset metas from files'''
datas_list, labels_list = [], []
for f in self.files:
samples = scio.loadmat(f)
....
利用python的生成器來構(gòu)造一個(gè)sample, 用于后續(xù)配合tensorflow的dataset API進(jìn)行使用诈悍。sample = (data, label)
Note: 由于svhn數(shù)據(jù)集把0的label標(biāo)記成數(shù)字10, 而tensorflow的默認(rèn)實(shí)現(xiàn)分類任務(wù)的label都是從0開始(svhn是matlab格式, 從1作為下標(biāo)開始)棚品,所以要強(qiáng)制更改一下械筛。
if label == 10:
label = 0
Show
在編寫完dataset.py的代碼之后, 可以編寫一個(gè)簡(jiǎn)單的測(cè)試函數(shù)來檢驗(yàn)書寫的正確性, 同時(shí)也可以看看數(shù)據(jù)集的圖片內(nèi)容。后續(xù)如果大家開始對(duì)數(shù)據(jù)進(jìn)行augment之后, 這種方式還可以用來檢驗(yàn)augment之后的數(shù)據(jù)情況诬乞。
# show an img from dataset
%matplotlib inline
import matplotlib.pyplot as plt
ds = Dataset('train').load()
ds_gen = ds.instance_generator()
imggrid = []
for i in range(25):
img, label = next(ds_gen) # yield a sample
cv2.putText(img, str(label), (0, image_shape[0]), cv2.FONT_HERSHEY_SIMPLEX, 0.6,
(0, 255, 0), 2) # put a label on img
imggrid.append(img)
# make an img grid from an img list
imggrid = np.array(imggrid).reshape((5, 5, img.shape[0], img.shape[1], img.shape[2]))
imggrid = imggrid.transpose((0, 2, 1, 3, 4)).reshape((5*img.shape[0], 5*img.shape[1], 3))
imggrid = cv2.cvtColor(imggrid.astype('uint8'), cv2.COLOR_BGR2RGB)
# show
plt.figure()
plt.imshow(imggrid)
plt.show()
為了在jupyter notebook可以使用matplotlib在線實(shí)時(shí)查看生成的圖片.
%matplotlib inline
import matplotlib.pyplot as plt
用于將多張子圖片整合成一張大圖片
imggrid = np.array(imggrid).reshape((5, 5, img.shape[0], img.shape[1], img.shape[2]))
imggrid = imggrid.transpose((0, 2, 1, 3, 4)).reshape((5*img.shape[0], 5*img.shape[1], 3))
由于openCV采用的是BGR格式, 而matplotlib采用的是RGB格式, 所以在利用matplotlib進(jìn)行顯示的時(shí)候需進(jìn)行一次轉(zhuǎn)換。
imggrid = cv2.cvtColor(imggrid.astype('uint8'), cv2.COLOR_BGR2RGB)
Build a graph
tensorflow屬于符號(hào)式編程, 其需要先定義一個(gè)compute graph, graph描述了各個(gè)tensor之間的相互關(guān)系以及operators, 然后再將定義好的graph進(jìn)行編譯, 利用session進(jìn)行run. 在run的過程當(dāng)中, graph不會(huì)發(fā)生改變钠导。所以, 大家一定要形成一個(gè)意識(shí): graph的定義和計(jì)算是分隔開的震嫉。下面, 先關(guān)注how to build a graph。為了后續(xù)和我們提供的檢查工具tf-model-manip.py進(jìn)行配合, 強(qiáng)烈推薦大家按照01-svhn的工程架構(gòu)進(jìn)行后續(xù)代碼的書寫牡属。即在工程代碼中創(chuàng)建一個(gè)model.py文件, 內(nèi)含一個(gè)Model class, 在Model class中有一個(gè)method: build.
在__init__函數(shù)中指定param使用的初始化方式(這里使用MSRA初始化方式, 感興趣可以查看 https://arxiv.org/pdf/1502.01852.pdf)票堵。簡(jiǎn)單而言, 就是一個(gè)均值為0的正態(tài)分布, 其方差由輸入神經(jīng)元個(gè)數(shù)決定。也可以通過修改mode來進(jìn)行修改方差的計(jì)算方式逮栅。
def __init__(self):
self.weight_init = tf_contrib.layers.variance_scaling_initializer(factor=1.0,
mode='FAN_IN', uniform=False)
self.bias_init = tf.zeros_initializer()
self.reg = tf_contrib.layers.l2_regularizer(weight_decay)
三種mode(詳細(xì)資料: tensorflow-init )
if mode='FAN_IN': # Count only number of input connections.
n = fan_in
elif mode='FAN_OUT': # Count only number of output connections.
n = fan_out
elif mode='FAN_AVG': # Average number of inputs and output connections.
n = (fan_in + fan_out)/2.0
_conv_layer實(shí)現(xiàn)是一個(gè)默認(rèn)帶 BN和relu 的 conv2d.
def _conv_layer(self, name, inp, kernel_shape, stride, padding='SAME',is_training=False):
'''a conv layer = conv + bn + relu'''
with tf.variable_scope(name) as scope:
conv_filter = tf.get_variable(name='filter', shape=kernel_shape,
initializer=self.weight_init, regularizer=self.reg)
conv_bias = tf.get_variable(name='bias', shape=kernel_shape[-1],
initializer=self.bias_init)
x = tf.nn.conv2d(inp, conv_filter, strides=[1, stride, stride, 1],
padding=padding, data_format='NHWC')
x = tf.nn.bias_add(x, conv_bias, data_format='NHWC')
x = tf.layers.batch_normalization(x, axis=3, training=is_training)
x = tf.nn.relu(x)
return x
通過利用variable_scope形成命名空間, 實(shí)現(xiàn)命名的復(fù)用悴势。 e.g. name='conv1', 則conv_filter的最后名稱會(huì)變?yōu)? 'conv1/filter'
with tf.variable_scope(name) as scope:
conv_filter = tf.get_variable(name='filter', ....)
在tensorflow當(dāng)中, conv2d是一個(gè)operation, 必須注意的是參數(shù)strides與data_format的搭配
x = tf.nn.conv2d(inp, conv_filter, strides=[1, stride, stride, 1], data_format='NHWC')
x = tf.nn.conv2d(inp, conv_filter, strides=[1, 1, stride, stride], data_format='NCHW')
BN操作同樣也要注意與 data_format 的配合, 因?yàn)樵?CNN 當(dāng)中BN是針對(duì)channel進(jìn)行跨batch操作窗宇。所以, 要顯式指定 channel 維度。
x = tf.layers.batch_normalization(x, axis=3, training=is_training) # NHWC
x = tf.layers.batch_normalization(x, axis=1, training=is_training) # NCHW
BN另一個(gè)需要注意的點(diǎn)是: 它的運(yùn)行方式是有分 train 和 inference 兩個(gè)狀態(tài)的瞳浦。也就是說在train階段, BN的統(tǒng)計(jì)數(shù)(mean, variance)會(huì)發(fā)生變化, 而inference階段會(huì)保持統(tǒng)計(jì)數(shù)不變, 取得是train階段的統(tǒng)計(jì)數(shù)的moving average担映。
x = tf.layers.batch_normalization(x, axis=3, training=True) # train
x = tf.layers.batch_normalization(x, axis=3, training=False) # inference
目前_pool_layer只支持兩種方式: Max, Avg. 后續(xù)還會(huì)有需要的情況下會(huì)繼續(xù)增加操作。
def _pool_layer(self, name, inp, ksize, stride, padding='SAME', mode='MAX'):
'''a pool layer which only supports avg_pooling and max_pooling(default)'''
assert mode in ['MAX', 'AVG'], 'the mode of pool must be MAX or AVG'
if mode == 'MAX':
x = tf.nn.max_pool(inp, ksize=[1, ksize, ksize, 1], strides=[1, stride, stride, 1],
padding=padding, name=name, data_format='NHWC')
elif mode == 'AVG':
x = tf.nn.avg_pool(inp, ksize=[1, ksize, ksize, 1], strides=[1, stride, stride, 1],
padding=padding, name=name, data_format='NHWC')
return x
在使用pooling的時(shí)候, 同樣也要注意與 data_format 的配合叫潦。
x = tf.nn.max_pool(inp, ksize=[1, ksize, ksize, 1], strides=[1, stride, stride, 1], data_format='NHWC')
x = tf.nn.max_pool(inp, ksize=[1, 1, ksize, ksize], strides=[1, 1, stride, stride], data_format='NCHW')
由于fc_layer一般是接在 conv or pool layer后面, 所以會(huì)先做拉直操作(flatten)蝇完。
def _fc_layer(self, name, inp, units, dropout=0.5):
'''a full connect layer'''
with tf.variable_scope(name) as scope:
shape = inp.get_shape().as_list() # get the shape of input
dim = 1
for d in shape[1:]:
dim *= d
x = tf.reshape(inp, [-1, dim]) # flatten
if dropout > 0: # if with dropout
x = tf.nn.dropout(x, keep_prob=dropout, name='dropout')
x = tf.layers.dense(x, units, kernel_initializer=self.weight_init,
bias_initializer=self.bias_init, kernel_regularizer=self.reg)
return x
build函數(shù)利用之前的基礎(chǔ)模塊: conv, pool, fc 進(jìn)行組合, 定義graph
def bulid(self):
# set inputs
data = tf.placeholder(tf.float32, shape=(None,)+image_shape+(nr_channel,),
name='data')
label = tf.placeholder(tf.int32, shape=(None,), name='label')
label_onehot = tf.one_hot(label, nr_class, dtype=tf.int32)
is_training = tf.placeholder(tf.bool, name='is_training') # a flag of bn
# conv1
x = self._conv_layer(name='conv1', inp=data,
kernel_shape=[3, 3, nr_channel, 16], stride=1,
is_training=is_training) # Nx32x32x32
x = self._pool_layer(name='pool1', inp=x, ksize=2, stride=2, mode='MAX') # Nx16x16x16
# conv2
x = self._conv_layer(name='conv2a', inp=x, kernel_shape=[3, 3, 16, 32],
stride=1, is_training=is_training)
x = self._conv_layer(name='conv2b', inp=x, kernel_shape=[3, 3, 32, 32],
stride=1, is_training=is_training)
x = self._pool_layer(name='pool2', inp=x, ksize=2, stride=2, mode='MAX') # Nx8x8x32
# conv3
x = self._conv_layer(name='conv3a', inp=x, kernel_shape=[3, 3, 32, 64],
stride=1, is_training=is_training)
x = self._conv_layer(name='conv3b', inp=x, kernel_shape=[3, 3, 64, 64],
stride=1, is_training=is_training)
x = self._pool_layer(name='pool3', inp=x, ksize=2, stride=2, mode='MAX') # Nx4x4x64
# conv4
x = self._conv_layer(name='conv4a', inp=x, kernel_shape=[3, 3, 64, 128],
stride=1, is_training=is_training)
x = self._conv_layer(name='conv4b', inp=x, kernel_shape=[3, 3, 128, 128],
stride=1, is_training=is_training)
x = self._pool_layer(name='pool4', inp=x, ksize=4, stride=4, mode='AVG') # Nx1x1x128
# fc
logits = self._fc_layer(name='fc1', inp=x, units=nr_class, dropout=0)
# softmax
preds = tf.nn.softmax(logits)
placeholders = {
'data': data,
'label': label,
'is_training': is_training,
}
return placeholders, label_onehot, logits, preds
由于graph在run的時(shí)候與外界是隔離的, 所以必須要通過某種接口與外界的數(shù)據(jù)進(jìn)行交互。這種接口就是: placeholder.
在shape參數(shù)指定None, 表示可以接受任意數(shù)值矗蕊。
data = tf.placeholder(tf.float32, shape=(None,)+image_shape+(nr_channel,), name='data')
label = tf.placeholder(tf.int32, shape=(None,), name='label')
Training
在定義完graph之后, 就可以開始raining短蜕。對(duì)應(yīng)于01-svhn工程中的train.py
tensorflow在數(shù)據(jù)供給方面提供了幾種方式。01-svhn工程中采用利用generator進(jìn)行配合的方式傻咖。
def get_dataset_batch(ds_name):
'''get a batch generator of dataset'''
dataset = Dataset(ds_name)
ds_gnr = dataset.load().instance_generator
ds = tf.data.Dataset().from_generator(ds_gnr, output_types=(tf.float32, tf.int32),)
if ds_name == 'train':
ds = ds.shuffle(dataset.instances_per_epoch)
ds = ds.repeat(nr_epoch)
elif ds_name == 'test':
ds = ds.repeat(nr_epoch // test_interval)
ds = ds.batch(minibatch_size, drop_remainder=True)
ds_iter = ds.make_one_shot_iterator()
sample_gnr = ds_iter.get_next()
return sample_gnr, dataset
為了加速訓(xùn)練, tensorflow在shuffle時(shí)會(huì)要求指定數(shù)據(jù)量, 01-svhn默認(rèn)指定一個(gè)epoch的數(shù)據(jù)量朋魔。所以, 有些同學(xué)會(huì)發(fā)現(xiàn)使用extra_32x32.mat之后會(huì)出現(xiàn)內(nèi)存爆炸的問題。因此, 大家可以通過指定較小的數(shù)據(jù)量來avoid.
ds = ds.shuffle(dataset.instances_per_epoch)
用于指定數(shù)據(jù)集的遍歷次數(shù), 一般與訓(xùn)練的epoch數(shù)保持一致卿操。
ds = ds.repeat(nr_epoch)
獲取一個(gè)batch生成器, 即每次都會(huì)生成數(shù)據(jù)的batch形式. drop_remainder用于指定: 若當(dāng)最后一次batch數(shù)據(jù)量 < minibatch_size, 若true由丟棄不用, 默認(rèn)是使用小batch的數(shù)據(jù)警检。
ds = ds.batch(minibatch_size, drop_remainder=True)
ds_iter = ds.make_one_shot_iterator()
sample_gnr = ds_iter.get_next()
# load datasets
train_batch_gnr, train_set = get_dataset_batch(ds_name='train')
test_batch_gnr, test_set = get_dataset_batch(ds_name='test')
# build a compute graph
network = Model()
placeholders, label_onehot, logits, preds = network.bulid()
loss_reg = tf.add_n(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES))
loss = tf.losses.softmax_cross_entropy(label_onehot, logits) + loss_reg
# set a performance metric
correct_pred = tf.equal(tf.cast(tf.argmax(preds, 1), dtype=tf.int32),
tf.cast(tf.argmax(label_onehot, 1), dtype=tf.int32))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# learn rate config
global_steps = tf.Variable(0, trainable=False) # a cnt to record the num of minibatchs
boundaries = [train_set.minibatchs_per_epoch*15, train_set.minibatchs_per_epoch*40]
values = [0.01, 0.001, 0.0005]
lr = tf.train.piecewise_constant(global_steps, boundaries, values)
opt = tf.train.AdamOptimizer(lr) # use adam as optimizer
# in order to update BN in every iter, a trick in tf
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train = opt.minimize(loss)
tf.get_collection用于遍歷圖中的變量, 將具有共性的變量收集起來形成一個(gè)list。由于有正則化的變量會(huì)帶有REGULARIZATION_LOSSES的key, 所以將這些變量值進(jìn)行相加即得到正則化項(xiàng)的loss.
loss_reg = tf.add_n(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES))
由于tensorflow沒有直接實(shí)現(xiàn)的cross_entropy函數(shù), 而是將softmax和cross_entropy直接集合成一個(gè)函數(shù), 所以這里給定的參數(shù)必須是logits, 而不是preds.
tf.losses.softmax_cross_entropy(label_onehot, logits)
將accuracy作為對(duì)模型性能的評(píng)測(cè)標(biāo)準(zhǔn)害淤。
correct_pred = tf.equal(tf.cast(tf.argmax(preds, 1), dtype=tf.int32),
tf.cast(tf.argmax(label_onehot, 1), dtype=tf.int32))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
learning rate是訓(xùn)練NN最重要的超參數(shù), 在01-svhn當(dāng)中采用階梯退火的方式來更改學(xué)習(xí)率扇雕。boundaries指定變化的minibatchs點(diǎn), values指定數(shù)值。
global_steps = tf.Variable(0, trainable=False) # a cnt to record the num of minibatchs
boundaries = [train_set.minibatchs_per_epoch*15, train_set.minibatchs_per_epoch*40]
values = [0.01, 0.001, 0.0005]
lr = tf.train.piecewise_constant(global_steps, boundaries, values)
指定訓(xùn)練時(shí)使用的optimizer. 01-svhn工程使用adam, adam會(huì)有一些額外參數(shù)可以指定, 但必須指定的只有學(xué)習(xí)率窥摄。
opt = tf.train.AdamOptimizer(lr) # use adam as optimizer
在tensorflow中, 要正確實(shí)現(xiàn)BN比其他框架要更為困難镶奉。由于BN的統(tǒng)計(jì)量是每個(gè)minibatch都要變化, 所以必須通過tf.control_dependencies來顯式控制. 它會(huì)使得每一次run都會(huì)采用update_ops對(duì)應(yīng)的最新值。
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train = opt.minimize(loss)
tensorflow利用session來控制graph的running.
# create a session
tf.set_random_seed(12345) # ensure consistent results
global_cnt = 0 # a cnt to record the num of minibatchs
with tf.Session() as sess:
sess.run(tf.global_variables_initializer()) # init all variables
# training
for e in range(nr_epoch):
for _ in range(train_set.minibatchs_per_epoch):
global_cnt += 1
images, labels = sess.run(train_batch_gnr) # get a batch of (img, label)
feed_dict = { # assign data to placeholders respectively
placeholders['data']: images,
placeholders['label']: labels,
global_steps: global_cnt,
placeholders['is_training']: True, # in training phase, set True
}
# run train, and output all values you want to monitor
_, loss_v, acc_v, lr_v = sess.run([train, loss, accuracy, lr],
feed_dict=feed_dict)
if global_cnt % show_interval == 0:
print(
"e:{},{}/{}".format(e, global_cnt % train_set.minibatchs_per_epoch,
train_set.minibatchs_per_epoch),
'loss: {:.3f}'.format(loss_v),
'acc: {:.3f}'.format(acc_v),
'lr: {:.3f}'.format(lr_v),
)
# validation
if epoch % test_interval == 0:
loss_sum, acc_sum = 0, 0 # init
for i in range(test_set.minibatchs_per_epoch):
images, labels = sess.run(test_batch_gnr)
feed_dict = {
placeholders['data']: images,
placeholders['label']: labels,
global_steps: global_cnt,
placeholders['is_training']: False, # in test phase, set False
}
loss_v, acc_v = sess.run([loss, accuracy], feed_dict=feed_dict)
loss_sum += loss_v # update
acc_sum += acc_v # update
print("\n**************Validation results****************")
print('loss_avg: {:.3f}'.format(loss_sum / test_set.minibatchs_per_epoch),
'accuracy_avg: {:.3f}'.format(acc_sum / test_set.minibatchs_per_epoch))
print("************************************************\n")
print('Training is done, exit.')
在開始進(jìn)行trainning之前, 需要對(duì)graph中所有的變量進(jìn)行init
sess.run(tf.global_variables_initializer()) # init all variables
運(yùn)行batch_gnr可以生成batch形式的images, labels. 在這里可以看出tensorflow只有利用run才會(huì)真正運(yùn)行, 之前都只是定義符號(hào)崭放。
images, labels = sess.run(train_batch_gnr) # get a batch of (img, label)
通過在list當(dāng)中指定想run的目標(biāo)哨苛。在train階段, 必須要運(yùn)行優(yōu)化函數(shù), 才能實(shí)現(xiàn)對(duì)loss的優(yōu)化。所以, 問: 如何想實(shí)時(shí)獲取learning rate的數(shù)值, 該怎么辦币砂?
train = opt.minimize(loss)
_, loss_v, acc_v = sess.run([train, loss, accuracy], feed_dict=feed_dict)