????????在前面的文章 TensorFlow 訓(xùn)練 CNN 分類器 中我們已經(jīng)學習了使用 TensorFlow 底層的函數(shù)來構(gòu)建簡單的 CNN 分類模型透且,但比較繁瑣的是在定義 predict
函數(shù)時需要花費大量的代碼先聲明各層的權(quán)重和偏置秽誊,然后在搭建網(wǎng)絡(luò)時還要不厭其煩的重復(fù)堆疊卷積琳骡、激活、池化等操作最易。本文介紹一種更方便構(gòu)建神經(jīng)網(wǎng)絡(luò)模型的方法耘纱。
一、tf.contrib.slim 構(gòu)建 CNN 模型
????????我們再次考慮文章 TensorFlow 訓(xùn)練 CNN 分類器 中的 10 分類任務(wù)憎亚,唯一的區(qū)別是我們希望用更簡潔的代碼來替換 predict
函數(shù)第美。這可以通過使用 tf.contrib.slim
模塊來實現(xiàn)什往。
????????在 tf.contrib.slim
模塊中卷積層的定義通過函數(shù):
slim.conv2d(inputs,
num_outputs,
kernel_size,
stride=1,
padding='SAME',
data_format=None,
rate=1,
activation_fn=nn.relu,
normalizer_fn=None,
normalizer_params=None,
weights_initializer=initializers.xavier_initializer(),
weights_regularizer=None,
biases_initializer=init_ops.zeros_initializer(),
biases_regularizer=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
scope=None)
來實現(xiàn)躯舔,可以看到在這個函數(shù)中除了可以指定通常的卷積核大小 kernel_size
粥庄,填充方式 padding
惜互,卷積步幅 stride
和 特征映射個數(shù) num_outputs
等參數(shù)外,還可以指定權(quán)重和偏置的初始化方式琳拭、正則化方式和激活函數(shù)等训堆。也就是說,使用 slim 模塊來定義卷積層不需要事先額外聲明權(quán)重和偏置變量白嘁,也不需要再額外的顯式的進行激活和正則化操作蔫慧,這些都已經(jīng)在模塊里內(nèi)置了。
????????類似的权薯,全連接層可以使用 slim.fully_connected
函數(shù)來定義姑躲。其它重要的操作包括池化、批標準化盟蚣、dropout阐枣、平鋪等也分別集成為了相應(yīng)的函數(shù) slim.max_pool2d, slim.batch_norm, slim.dropout, slim.flatten
等。更便利的是,如果要重復(fù)堆疊多個相同的層俊戳,則既可以用循環(huán),比如要重復(fù)卷積層 3 次:
for i in range(3):
net = slim.conv2d(net, 256, [3, 3], scope='conv1_{}'.format(i))
來實現(xiàn)铭拧,也可以用更簡單的函數(shù):
net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv1')
來實現(xiàn)淡喜。函數(shù):
slim.repeat(inputs, repetitions, layer, *args, **kwargs)
將使得構(gòu)建大型神經(jīng)網(wǎng)絡(luò)變得更加緊湊和方便。以上這些函數(shù)的封裝出現(xiàn)使得 TensorFlow 構(gòu)建卷積神經(jīng)網(wǎng)絡(luò)的便捷性大大提高易桃,甚至不輸于 Keras贸宏。
????????回到我們在文章 TensorFlow 訓(xùn)練 CNN 分類器 中考慮過的 10 分類任務(wù),在那篇文章的源代碼 model.py 中我們花了大量的篇幅來構(gòu)建一個包含 6 個卷積層和 3 個全連接層的小型 CNN 模型签赃,現(xiàn)在我們可以用 tf.contrib.slim
模塊來重寫模型構(gòu)建函數(shù) predict
:
def predict(self, preprocessed_inputs):
"""Predict prediction tensors from inputs tensor.
Outputs of this function can be passed to loss or postprocess functions.
Args:
preprocessed_inputs: A float32 tensor with shape [batch_size,
height, width, num_channels] representing a batch of images.
Returns:
prediction_dict: A dictionary holding prediction tensors to be
passed to the Loss or Postprocess functions.
"""
net = preprocessed_inputs
net = slim.repeat(net, 2, slim.conv2d, 32, [3, 3], scope='conv1')
net = slim.max_pool2d(net, [2, 2], scope='pool1')
net = slim.repeat(net, 2, slim.conv2d, 64, [3, 3], scope='conv2')
net = slim.max_pool2d(net, [2, 2], scope='pool2')
net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv3')
net = slim.flatten(net, scope='flatten')
net = slim.dropout(net, keep_prob=0.5,
is_training=self._is_training)
net = slim.repeat(net, 2, slim.fully_connected, 512, scope='fc1')
net = slim.fully_connected(net, self.num_classes,
activation_fn=None, scope='fc2')
prediction_dict = {'logits': net}
return prediction_dict
顯然,這看起來不僅快捷多了尖昏,還使得模型構(gòu)建更加直觀了陨簇,比起 Keras 有過之而無不及。做了這樣重寫之后纹笼,整個 model.py 文件如下:
# -*- coding: utf-8 -*-
"""
Created on Fri Mar 30 16:54:02 2018
@author: shirhe-lyh
"""
import tensorflow as tf
slim = tf.contrib.slim
class Model(object):
"""A simple 10-classification CNN model definition."""
def __init__(self,
is_training,
num_classes):
"""Constructor.
Args:
is_training: A boolean indicating whether the training version of
computation graph should be constructed.
num_classes: Number of classes.
"""
self._num_classes = num_classes
self._is_training = is_training
@property
def num_classes(self):
return self._num_classes
def preprocess(self, inputs):
"""Predict prediction tensors from inputs tensor.
Outputs of this function can be passed to loss or postprocess functions.
Args:
preprocessed_inputs: A float32 tensor with shape [batch_size,
height, width, num_channels] representing a batch of images.
Returns:
prediction_dict: A dictionary holding prediction tensors to be
passed to the Loss or Postprocess functions.
"""
preprocessed_inputs = tf.to_float(inputs)
preprocessed_inputs = tf.subtract(preprocessed_inputs, 128.0)
preprocessed_inputs = tf.div(preprocessed_inputs, 128.0)
return preprocessed_inputs
def predict(self, preprocessed_inputs):
"""Predict prediction tensors from inputs tensor.
Outputs of this function can be passed to loss or postprocess functions.
Args:
preprocessed_inputs: A float32 tensor with shape [batch_size,
height, width, num_channels] representing a batch of images.
Returns:
prediction_dict: A dictionary holding prediction tensors to be
passed to the Loss or Postprocess functions.
"""
with slim.arg_scope([slim.conv2d, slim.fully_connected],
activation_fn=tf.nn.relu):
net = preprocessed_inputs
net = slim.repeat(net, 2, slim.conv2d, 32, [3, 3], scope='conv1')
net = slim.max_pool2d(net, [2, 2], scope='pool1')
net = slim.repeat(net, 2, slim.conv2d, 64, [3, 3], scope='conv2')
net = slim.max_pool2d(net, [2, 2], scope='pool2')
net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv3')
net = slim.flatten(net, scope='flatten')
net = slim.dropout(net, keep_prob=0.5,
is_training=self._is_training)
net = slim.fully_connected(net, 512, scope='fc1')
net = slim.fully_connected(net, 512, scope='fc2')
net = slim.fully_connected(net, self.num_classes,
activation_fn=None, scope='fc3')
prediction_dict = {'logits': net}
return prediction_dict
def postprocess(self, prediction_dict):
"""Convert predicted output tensors to final forms.
Args:
prediction_dict: A dictionary holding prediction tensors.
**params: Additional keyword arguments for specific implementations
of specified models.
Returns:
A dictionary containing the postprocessed results.
"""
logits = prediction_dict['logits']
logits = tf.nn.softmax(logits)
classes = tf.cast(tf.argmax(logits, axis=1), dtype=tf.int32)
postprecessed_dict = {'classes': classes}
return postprecessed_dict
def loss(self, prediction_dict, groundtruth_lists):
"""Compute scalar loss tensors with respect to provided groundtruth.
Args:
prediction_dict: A dictionary holding prediction tensors.
groundtruth_lists: A list of tensors holding groundtruth
information, with one entry for each image in the batch.
Returns:
A dictionary mapping strings (loss names) to scalar tensors
representing loss values.
"""
logits = prediction_dict['logits']
loss = tf.reduce_mean(
tf.nn.sparse_softmax_cross_entropy_with_logits(
logits=logits, labels=groundtruth_lists))
loss_dict = {'loss': loss}
return loss_dict
上述代碼除了聲明 slim
模塊:
slim = tf.contrib.slim
是新加的枢冤,以及重寫了 predict
函數(shù)之外羽峰,沒有做其它任何修改∨斑耄總的來說,代碼量遠遠減少了。
二饲趋、模型訓(xùn)練和測試
????????由 tf.contrib.slim
模塊定義的神經(jīng)網(wǎng)絡(luò)模型可以用兩種不同的方式來訓(xùn)練龄砰,一種跟前文 TensorFlow 訓(xùn)練 CNN 分類器 的訓(xùn)練方式一樣值依,另一種則繼續(xù)借助 tf.contrib.slim
模塊,使用 slim.learning.train
函數(shù)來快速的實現(xiàn)。這里我們繼續(xù)使用 TensorFlow 訓(xùn)練 CNN 分類器 的訓(xùn)練方式,在后續(xù)的文章中我們將說明怎么用第二種方式來達到訓(xùn)練目的。
????????訓(xùn)練文件 train.py 如下累提,直接從前文復(fù)制過來沒有做任何修改:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Mar 30 19:27:44 2018
@author: shirhe-lyh
"""
"""Train a CNN model to classifying 10 digits.
Example Usage:
---------------
python3 train.py \
--images_path: Path to the training images (directory).
--model_output_path: Path to model.ckpt.
"""
import cv2
import glob
import numpy as np
import os
import tensorflow as tf
import model
flags = tf.app.flags
flags.DEFINE_string('images_path', None, 'Path to training images.')
flags.DEFINE_string('model_output_path', None, 'Path to model checkpoint.')
FLAGS = flags.FLAGS
def get_train_data(images_path):
"""Get the training images from images_path.
Args:
images_path: Path to trianing images.
Returns:
images: A list of images.
lables: A list of integers representing the classes of images.
Raises:
ValueError: If images_path is not exist.
"""
if not os.path.exists(images_path):
raise ValueError('images_path is not exist.')
images = []
labels = []
images_path = os.path.join(images_path, '*.jpg')
count = 0
for image_file in glob.glob(images_path):
count += 1
if count % 100 == 0:
print('Load {} images.'.format(count))
image = cv2.imread(image_file)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Assume the name of each image is imagexxx_label.jpg
label = int(image_file.split('_')[-1].split('.')[0])
images.append(image)
labels.append(label)
images = np.array(images)
labels = np.array(labels)
return images, labels
def next_batch_set(images, labels, batch_size=128):
"""Generate a batch training data.
Args:
images: A 4-D array representing the training images.
labels: A 1-D array representing the classes of images.
batch_size: An integer.
Return:
batch_images: A batch of images.
batch_labels: A batch of labels.
"""
indices = np.random.choice(len(images), batch_size)
batch_images = images[indices]
batch_labels = labels[indices]
return batch_images, batch_labels
def main(_):
inputs = tf.placeholder(tf.float32, shape=[None, 28, 28, 3], name='inputs')
labels = tf.placeholder(tf.int32, shape=[None], name='labels')
cls_model = model.Model(is_training=True, num_classes=10)
preprocessed_inputs = cls_model.preprocess(inputs)
prediction_dict = cls_model.predict(preprocessed_inputs)
loss_dict = cls_model.loss(prediction_dict, labels)
loss = loss_dict['loss']
postprocessed_dict = cls_model.postprocess(prediction_dict)
classes = postprocessed_dict['classes']
classes_ = tf.identity(classes, name='classes')
acc = tf.reduce_mean(tf.cast(tf.equal(classes, labels), 'float'))
global_step = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(0.05, global_step, 150, 0.9)
optimizer = tf.train.MomentumOptimizer(learning_rate, 0.9)
train_step = optimizer.minimize(loss, global_step)
saver = tf.train.Saver()
images, targets = get_train_data(FLAGS.images_path)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for i in range(6000):
batch_images, batch_labels = next_batch_set(images, targets)
train_dict = {inputs: batch_images, labels: batch_labels}
sess.run(train_step, feed_dict=train_dict)
loss_, acc_ = sess.run([loss, acc], feed_dict=train_dict)
train_text = 'step: {}, loss: {}, acc: {}'.format(
i+1, loss_, acc_)
print(train_text)
saver.save(sess, FLAGS.model_output_path)
if __name__ == '__main__':
tf.app.run()
在該文件的目錄終端執(zhí)行:
python3 train.py --images_path /home/.../datasets/images \
--model_output_path /home/.../model.ckpt
可以查看全部訓(xùn)練過程輸出置森,比如:
step: 1, loss: 2.302396535873413, acc: 0.1640625
step: 2, loss: 2.302823066711426, acc: 0.0859375
step: 3, loss: 2.3024234771728516, acc: 0.1171875
step: 4, loss: 2.302684783935547, acc: 0.0546875
step: 5, loss: 2.3024277687072754, acc: 0.109375
step: 6, loss: 2.3024179935455322, acc: 0.0859375
step: 7, loss: 2.302734851837158, acc: 0.0703125
step: 8, loss: 2.3025729656219482, acc: 0.0859375
step: 9, loss: 2.3026342391967773, acc: 0.1171875
step: 10, loss: 2.3026227951049805, acc: 0.1171875
step: 11, loss: 2.3024468421936035, acc: 0.0859375
step: 12, loss: 2.302351236343384, acc: 0.140625
step: 13, loss: 2.302664279937744, acc: 0.1015625
step: 14, loss: 2.302532434463501, acc: 0.1171875
step: 15, loss: 2.3025684356689453, acc: 0.1015625
step: 16, loss: 2.302473306655884, acc: 0.0703125
step: 17, loss: 2.30285382270813, acc: 0.078125
step: 18, loss: 2.302445411682129, acc: 0.0859375
step: 19, loss: 2.302391290664673, acc: 0.0859375
step: 20, loss: 2.3027210235595703, acc: 0.109375
···
step: 5981, loss: 1.4615014791488647, acc: 1.0
step: 5982, loss: 1.46712064743042, acc: 1.0
step: 5983, loss: 1.4673535823822021, acc: 1.0
step: 5984, loss: 1.46533203125, acc: 0.9921875
step: 5985, loss: 1.4692511558532715, acc: 0.9921875
step: 5986, loss: 1.4615371227264404, acc: 1.0
step: 5987, loss: 1.461196780204773, acc: 1.0
step: 5988, loss: 1.4663658142089844, acc: 1.0
step: 5989, loss: 1.467726707458496, acc: 0.9921875
step: 5990, loss: 1.4727323055267334, acc: 0.9921875
step: 5991, loss: 1.461942434310913, acc: 1.0
step: 5992, loss: 1.461172342300415, acc: 1.0
step: 5993, loss: 1.4619064331054688, acc: 1.0
step: 5994, loss: 1.466255784034729, acc: 0.9921875
step: 5995, loss: 1.4612611532211304, acc: 1.0
step: 5996, loss: 1.4613593816757202, acc: 1.0
step: 5997, loss: 1.4761428833007812, acc: 0.984375
step: 5998, loss: 1.4681826829910278, acc: 0.9921875
step: 5999, loss: 1.4703295230865479, acc: 0.9921875
step: 6000, loss: 1.4703948497772217, acc: 0.9921875
根據(jù)以上輸出發(fā)現(xiàn),準確率已經(jīng)穩(wěn)定在 99% 以上,而損失則穩(wěn)定在 1.46-1.47之間(可以與之后使用 tf.contrib.slim
模塊的訓(xùn)練結(jié)果做比較,兩者應(yīng)該相差不大)。
????????從前文將測試代碼 evaluate.py 也復(fù)制過來:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Mon Apr 2 14:02:05 2018
@author: shirhe-lyh
"""
import numpy as np
import tensorflow as tf
from captcha.image import ImageCaptcha
flags = tf.app.flags
flags.DEFINE_string('model_ckpt_path', None, 'Path to model checkpoint.')
FLAGS = flags.FLAGS
def generate_captcha(text='1'):
capt = ImageCaptcha(width=28, height=28, font_sizes=[24])
image = capt.generate_image(text)
image = np.array(image, dtype=np.uint8)
return image
def main(_):
with tf.Session() as sess:
ckpt_path = FLAGS.model_ckpt_path
saver = tf.train.import_meta_graph(ckpt_path + '.meta')
saver.restore(sess, ckpt_path)
inputs = tf.get_default_graph().get_tensor_by_name('inputs:0')
classes = tf.get_default_graph().get_tensor_by_name('classes:0')
for i in range(10):
label = np.random.randint(0, 10)
image = generate_captcha(str(label))
image_np = np.expand_dims(image, axis=0)
predicted_label = sess.run(classes,
feed_dict={inputs: image_np})
print(predicted_label, ' vs ', label)
if __name__ == '__main__':
tf.app.run()
執(zhí)行
python3 evaluate.py --model_ckpt_path /home/.../model.ckpt
一睹訓(xùn)練好的模型風采露氮,比如我執(zhí)行的其中兩次輸出為:
[0] vs 0
[6] vs 6
[7] vs 7
[8] vs 8
[2] vs 2
[4] vs 4
[3] vs 3
[5] vs 5
[7] vs 7
[8] vs 8
[3] vs 3
[6] vs 6
[2] vs 2
[2] vs 2
[2] vs 2
[7] vs 7
[9] vs 9
[6] vs 6
[4] vs 4
[5] vs 5
效果還是可以的详民。
預(yù)告:下一篇將介紹怎么把圖像文件轉(zhuǎn)化為 .record
文件饿凛,為使用 tf.contrib.slim
模塊訓(xùn)練模型做準備纠吴。