TensorFlow-slim 訓(xùn)練 CNN 分類模型

????????在前面的文章 TensorFlow 訓(xùn)練 CNN 分類器 中我們已經(jīng)學習了使用 TensorFlow 底層的函數(shù)來構(gòu)建簡單的 CNN 分類模型透且,但比較繁瑣的是在定義 predict 函數(shù)時需要花費大量的代碼先聲明各層的權(quán)重和偏置秽誊,然后在搭建網(wǎng)絡(luò)時還要不厭其煩的重復(fù)堆疊卷積琳骡、激活、池化等操作最易。本文介紹一種更方便構(gòu)建神經(jīng)網(wǎng)絡(luò)模型的方法耘纱。

一、tf.contrib.slim 構(gòu)建 CNN 模型

????????我們再次考慮文章 TensorFlow 訓(xùn)練 CNN 分類器 中的 10 分類任務(wù)憎亚,唯一的區(qū)別是我們希望用更簡潔的代碼來替換 predict 函數(shù)第美。這可以通過使用 tf.contrib.slim 模塊來實現(xiàn)什往。

????????在 tf.contrib.slim 模塊中卷積層的定義通過函數(shù):

slim.conv2d(inputs,
            num_outputs,
            kernel_size,
            stride=1,
            padding='SAME',
            data_format=None,
            rate=1,
            activation_fn=nn.relu,
            normalizer_fn=None,
            normalizer_params=None,
            weights_initializer=initializers.xavier_initializer(),
            weights_regularizer=None,
            biases_initializer=init_ops.zeros_initializer(),
            biases_regularizer=None,
            reuse=None,
            variables_collections=None,
            outputs_collections=None,
            trainable=True,
            scope=None)

來實現(xiàn)躯舔,可以看到在這個函數(shù)中除了可以指定通常的卷積核大小 kernel_size粥庄,填充方式 padding惜互,卷積步幅 stride 和 特征映射個數(shù) num_outputs 等參數(shù)外,還可以指定權(quán)重和偏置的初始化方式琳拭、正則化方式和激活函數(shù)等训堆。也就是說,使用 slim 模塊來定義卷積層不需要事先額外聲明權(quán)重和偏置變量白嘁,也不需要再額外的顯式的進行激活和正則化操作蔫慧,這些都已經(jīng)在模塊里內(nèi)置了。

????????類似的权薯,全連接層可以使用 slim.fully_connected 函數(shù)來定義姑躲。其它重要的操作包括池化、批標準化盟蚣、dropout阐枣、平鋪等也分別集成為了相應(yīng)的函數(shù) slim.max_pool2d, slim.batch_norm, slim.dropout, slim.flatten 等。更便利的是,如果要重復(fù)堆疊多個相同的層俊戳,則既可以用循環(huán),比如要重復(fù)卷積層 3 次:

for i in range(3):
    net = slim.conv2d(net, 256, [3, 3], scope='conv1_{}'.format(i))

來實現(xiàn)铭拧,也可以用更簡單的函數(shù):

net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv1')

來實現(xiàn)淡喜。函數(shù):

slim.repeat(inputs, repetitions, layer, *args, **kwargs)

將使得構(gòu)建大型神經(jīng)網(wǎng)絡(luò)變得更加緊湊和方便。以上這些函數(shù)的封裝出現(xiàn)使得 TensorFlow 構(gòu)建卷積神經(jīng)網(wǎng)絡(luò)的便捷性大大提高易桃,甚至不輸于 Keras贸宏。

????????回到我們在文章 TensorFlow 訓(xùn)練 CNN 分類器 中考慮過的 10 分類任務(wù),在那篇文章的源代碼 model.py 中我們花了大量的篇幅來構(gòu)建一個包含 6 個卷積層和 3 個全連接層的小型 CNN 模型签赃,現(xiàn)在我們可以用 tf.contrib.slim 模塊來重寫模型構(gòu)建函數(shù) predict

def predict(self, preprocessed_inputs):
        """Predict prediction tensors from inputs tensor.
        
        Outputs of this function can be passed to loss or postprocess functions.
        
        Args:
            preprocessed_inputs: A float32 tensor with shape [batch_size,
                height, width, num_channels] representing a batch of images.
            
        Returns:
            prediction_dict: A dictionary holding prediction tensors to be
                passed to the Loss or Postprocess functions.
        """
        net = preprocessed_inputs
        net = slim.repeat(net, 2, slim.conv2d, 32, [3, 3], scope='conv1')
        net = slim.max_pool2d(net, [2, 2], scope='pool1')
        net = slim.repeat(net, 2, slim.conv2d, 64, [3, 3], scope='conv2')
        net = slim.max_pool2d(net, [2, 2], scope='pool2')
        net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv3')
        net = slim.flatten(net, scope='flatten')
        net = slim.dropout(net, keep_prob=0.5, 
                           is_training=self._is_training)
        net = slim.repeat(net, 2, slim.fully_connected, 512, scope='fc1')
        net = slim.fully_connected(net, self.num_classes, 
                                   activation_fn=None, scope='fc2')
        prediction_dict = {'logits': net}
        return prediction_dict

顯然,這看起來不僅快捷多了尖昏,還使得模型構(gòu)建更加直觀了陨簇,比起 Keras 有過之而無不及。做了這樣重寫之后纹笼,整個 model.py 文件如下:

# -*- coding: utf-8 -*-
"""
Created on Fri Mar 30 16:54:02 2018

@author: shirhe-lyh
"""

import tensorflow as tf

slim = tf.contrib.slim
    
        
class Model(object):
    """A simple 10-classification CNN model definition."""
    
    def __init__(self,
                 is_training,
                 num_classes):
        """Constructor.
        
        Args:
            is_training: A boolean indicating whether the training version of
                computation graph should be constructed.
            num_classes: Number of classes.
        """
        self._num_classes = num_classes
        self._is_training = is_training

    @property
    def num_classes(self):
        return self._num_classes
        
    def preprocess(self, inputs):
        """Predict prediction tensors from inputs tensor.
        
        Outputs of this function can be passed to loss or postprocess functions.
        
        Args:
            preprocessed_inputs: A float32 tensor with shape [batch_size,
                height, width, num_channels] representing a batch of images.
            
        Returns:
            prediction_dict: A dictionary holding prediction tensors to be
                passed to the Loss or Postprocess functions.
        """
        preprocessed_inputs = tf.to_float(inputs)
        preprocessed_inputs = tf.subtract(preprocessed_inputs, 128.0)
        preprocessed_inputs = tf.div(preprocessed_inputs, 128.0)
        return preprocessed_inputs
    
    def predict(self, preprocessed_inputs):
        """Predict prediction tensors from inputs tensor.
        
        Outputs of this function can be passed to loss or postprocess functions.
        
        Args:
            preprocessed_inputs: A float32 tensor with shape [batch_size,
                height, width, num_channels] representing a batch of images.
            
        Returns:
            prediction_dict: A dictionary holding prediction tensors to be
                passed to the Loss or Postprocess functions.
        """
        with slim.arg_scope([slim.conv2d, slim.fully_connected],
                            activation_fn=tf.nn.relu):
            net = preprocessed_inputs
            net = slim.repeat(net, 2, slim.conv2d, 32, [3, 3], scope='conv1')
            net = slim.max_pool2d(net, [2, 2], scope='pool1')
            net = slim.repeat(net, 2, slim.conv2d, 64, [3, 3], scope='conv2')
            net = slim.max_pool2d(net, [2, 2], scope='pool2')
            net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv3')
            net = slim.flatten(net, scope='flatten')
            net = slim.dropout(net, keep_prob=0.5, 
                               is_training=self._is_training)
            net = slim.fully_connected(net, 512, scope='fc1')
            net = slim.fully_connected(net, 512, scope='fc2')
            net = slim.fully_connected(net, self.num_classes, 
                                       activation_fn=None, scope='fc3')
        prediction_dict = {'logits': net}
        return prediction_dict
    
    def postprocess(self, prediction_dict):
        """Convert predicted output tensors to final forms.
        
        Args:
            prediction_dict: A dictionary holding prediction tensors.
            **params: Additional keyword arguments for specific implementations
                of specified models.
                
        Returns:
            A dictionary containing the postprocessed results.
        """
        logits = prediction_dict['logits']
        logits = tf.nn.softmax(logits)
        classes = tf.cast(tf.argmax(logits, axis=1), dtype=tf.int32)
        postprecessed_dict = {'classes': classes}
        return postprecessed_dict
    
    def loss(self, prediction_dict, groundtruth_lists):
        """Compute scalar loss tensors with respect to provided groundtruth.
        
        Args:
            prediction_dict: A dictionary holding prediction tensors.
            groundtruth_lists: A list of tensors holding groundtruth
                information, with one entry for each image in the batch.
                
        Returns:
            A dictionary mapping strings (loss names) to scalar tensors
                representing loss values.
        """
        logits = prediction_dict['logits']
        loss = tf.reduce_mean(
            tf.nn.sparse_softmax_cross_entropy_with_logits(
                logits=logits, labels=groundtruth_lists))
        loss_dict = {'loss': loss}
        return loss_dict

上述代碼除了聲明 slim 模塊:
slim = tf.contrib.slim
是新加的枢冤,以及重寫了 predict 函數(shù)之外羽峰,沒有做其它任何修改∨斑耄總的來說,代碼量遠遠減少了。

二饲趋、模型訓(xùn)練和測試

????????由 tf.contrib.slim 模塊定義的神經(jīng)網(wǎng)絡(luò)模型可以用兩種不同的方式來訓(xùn)練龄砰,一種跟前文 TensorFlow 訓(xùn)練 CNN 分類器 的訓(xùn)練方式一樣值依,另一種則繼續(xù)借助 tf.contrib.slim 模塊,使用 slim.learning.train 函數(shù)來快速的實現(xiàn)。這里我們繼續(xù)使用 TensorFlow 訓(xùn)練 CNN 分類器 的訓(xùn)練方式,在后續(xù)的文章中我們將說明怎么用第二種方式來達到訓(xùn)練目的。

????????訓(xùn)練文件 train.py 如下累提,直接從前文復(fù)制過來沒有做任何修改:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Mar 30 19:27:44 2018

@author: shirhe-lyh
"""

"""Train a CNN model to classifying 10 digits.

Example Usage:
---------------
python3 train.py \
    --images_path: Path to the training images (directory).
    --model_output_path: Path to model.ckpt.
"""

import cv2
import glob
import numpy as np
import os
import tensorflow as tf

import model

flags = tf.app.flags

flags.DEFINE_string('images_path', None, 'Path to training images.')
flags.DEFINE_string('model_output_path', None, 'Path to model checkpoint.')
FLAGS = flags.FLAGS


def get_train_data(images_path):
    """Get the training images from images_path.
    
    Args: 
        images_path: Path to trianing images.
        
    Returns:
        images: A list of images.
        lables: A list of integers representing the classes of images.
        
    Raises:
        ValueError: If images_path is not exist.
    """
    if not os.path.exists(images_path):
        raise ValueError('images_path is not exist.')
        
    images = []
    labels = []
    images_path = os.path.join(images_path, '*.jpg')
    count = 0
    for image_file in glob.glob(images_path):
        count += 1
        if count % 100 == 0:
            print('Load {} images.'.format(count))
        image = cv2.imread(image_file)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        # Assume the name of each image is imagexxx_label.jpg
        label = int(image_file.split('_')[-1].split('.')[0])
        images.append(image)
        labels.append(label)
    images = np.array(images)
    labels = np.array(labels)
    return images, labels


def next_batch_set(images, labels, batch_size=128):
    """Generate a batch training data.
    
    Args:
        images: A 4-D array representing the training images.
        labels: A 1-D array representing the classes of images.
        batch_size: An integer.
        
    Return:
        batch_images: A batch of images.
        batch_labels: A batch of labels.
    """
    indices = np.random.choice(len(images), batch_size)
    batch_images = images[indices]
    batch_labels = labels[indices]
    return batch_images, batch_labels


def main(_):
    inputs = tf.placeholder(tf.float32, shape=[None, 28, 28, 3], name='inputs')
    labels = tf.placeholder(tf.int32, shape=[None], name='labels')
    
    cls_model = model.Model(is_training=True, num_classes=10)
    preprocessed_inputs = cls_model.preprocess(inputs)
    prediction_dict = cls_model.predict(preprocessed_inputs)
    loss_dict = cls_model.loss(prediction_dict, labels)
    loss = loss_dict['loss']
    postprocessed_dict = cls_model.postprocess(prediction_dict)
    classes = postprocessed_dict['classes']
    classes_ = tf.identity(classes, name='classes')
    acc = tf.reduce_mean(tf.cast(tf.equal(classes, labels), 'float'))
    
    global_step = tf.Variable(0, trainable=False)
    learning_rate = tf.train.exponential_decay(0.05, global_step, 150, 0.9)
    
    optimizer = tf.train.MomentumOptimizer(learning_rate, 0.9)
    train_step = optimizer.minimize(loss, global_step)
    
    saver = tf.train.Saver()
    
    images, targets = get_train_data(FLAGS.images_path)
    
    init = tf.global_variables_initializer()
    
    with tf.Session() as sess:
        sess.run(init)
        
        for i in range(6000):
            batch_images, batch_labels = next_batch_set(images, targets)
            train_dict = {inputs: batch_images, labels: batch_labels}
            
            sess.run(train_step, feed_dict=train_dict)
            
            loss_, acc_ = sess.run([loss, acc], feed_dict=train_dict)
            
            train_text = 'step: {}, loss: {}, acc: {}'.format(
                i+1, loss_, acc_)
            print(train_text)
            
        saver.save(sess, FLAGS.model_output_path)
        
    
if __name__ == '__main__':
    tf.app.run()

在該文件的目錄終端執(zhí)行:

python3 train.py --images_path /home/.../datasets/images \
                 --model_output_path /home/.../model.ckpt

可以查看全部訓(xùn)練過程輸出置森,比如:

step: 1, loss: 2.302396535873413, acc: 0.1640625
step: 2, loss: 2.302823066711426, acc: 0.0859375
step: 3, loss: 2.3024234771728516, acc: 0.1171875
step: 4, loss: 2.302684783935547, acc: 0.0546875
step: 5, loss: 2.3024277687072754, acc: 0.109375
step: 6, loss: 2.3024179935455322, acc: 0.0859375
step: 7, loss: 2.302734851837158, acc: 0.0703125
step: 8, loss: 2.3025729656219482, acc: 0.0859375
step: 9, loss: 2.3026342391967773, acc: 0.1171875
step: 10, loss: 2.3026227951049805, acc: 0.1171875
step: 11, loss: 2.3024468421936035, acc: 0.0859375
step: 12, loss: 2.302351236343384, acc: 0.140625
step: 13, loss: 2.302664279937744, acc: 0.1015625
step: 14, loss: 2.302532434463501, acc: 0.1171875
step: 15, loss: 2.3025684356689453, acc: 0.1015625
step: 16, loss: 2.302473306655884, acc: 0.0703125
step: 17, loss: 2.30285382270813, acc: 0.078125
step: 18, loss: 2.302445411682129, acc: 0.0859375
step: 19, loss: 2.302391290664673, acc: 0.0859375
step: 20, loss: 2.3027210235595703, acc: 0.109375

···

step: 5981, loss: 1.4615014791488647, acc: 1.0
step: 5982, loss: 1.46712064743042, acc: 1.0
step: 5983, loss: 1.4673535823822021, acc: 1.0
step: 5984, loss: 1.46533203125, acc: 0.9921875
step: 5985, loss: 1.4692511558532715, acc: 0.9921875
step: 5986, loss: 1.4615371227264404, acc: 1.0
step: 5987, loss: 1.461196780204773, acc: 1.0
step: 5988, loss: 1.4663658142089844, acc: 1.0
step: 5989, loss: 1.467726707458496, acc: 0.9921875
step: 5990, loss: 1.4727323055267334, acc: 0.9921875
step: 5991, loss: 1.461942434310913, acc: 1.0
step: 5992, loss: 1.461172342300415, acc: 1.0
step: 5993, loss: 1.4619064331054688, acc: 1.0
step: 5994, loss: 1.466255784034729, acc: 0.9921875
step: 5995, loss: 1.4612611532211304, acc: 1.0
step: 5996, loss: 1.4613593816757202, acc: 1.0
step: 5997, loss: 1.4761428833007812, acc: 0.984375
step: 5998, loss: 1.4681826829910278, acc: 0.9921875
step: 5999, loss: 1.4703295230865479, acc: 0.9921875
step: 6000, loss: 1.4703948497772217, acc: 0.9921875

根據(jù)以上輸出發(fā)現(xiàn),準確率已經(jīng)穩(wěn)定在 99% 以上,而損失則穩(wěn)定在 1.46-1.47之間(可以與之后使用 tf.contrib.slim 模塊的訓(xùn)練結(jié)果做比較,兩者應(yīng)該相差不大)。

????????從前文將測試代碼 evaluate.py 也復(fù)制過來:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Mon Apr  2 14:02:05 2018

@author: shirhe-lyh
"""

import numpy as np
import tensorflow as tf

from captcha.image import ImageCaptcha

flags = tf.app.flags
flags.DEFINE_string('model_ckpt_path', None, 'Path to model checkpoint.')
FLAGS = flags.FLAGS


def generate_captcha(text='1'):
    capt = ImageCaptcha(width=28, height=28, font_sizes=[24])
    image = capt.generate_image(text)
    image = np.array(image, dtype=np.uint8)
    return image


def main(_):
    with tf.Session() as sess:
        ckpt_path = FLAGS.model_ckpt_path
        saver = tf.train.import_meta_graph(ckpt_path + '.meta')
        saver.restore(sess, ckpt_path)
        inputs = tf.get_default_graph().get_tensor_by_name('inputs:0')
        classes = tf.get_default_graph().get_tensor_by_name('classes:0')
        
        for i in range(10):
            label = np.random.randint(0, 10)
            image = generate_captcha(str(label))
            image_np = np.expand_dims(image, axis=0)
            predicted_label = sess.run(classes, 
                                       feed_dict={inputs: image_np})
            print(predicted_label, ' vs ', label)
            
            
if __name__ == '__main__':
    tf.app.run()

執(zhí)行

python3 evaluate.py --model_ckpt_path /home/.../model.ckpt

一睹訓(xùn)練好的模型風采露氮,比如我執(zhí)行的其中兩次輸出為:

[0]  vs  0
[6]  vs  6
[7]  vs  7
[8]  vs  8
[2]  vs  2
[4]  vs  4
[3]  vs  3
[5]  vs  5
[7]  vs  7
[8]  vs  8


[3]  vs  3
[6]  vs  6
[2]  vs  2
[2]  vs  2
[2]  vs  2
[7]  vs  7
[9]  vs  9
[6]  vs  6
[4]  vs  4
[5]  vs  5

效果還是可以的详民。

預(yù)告:下一篇將介紹怎么把圖像文件轉(zhuǎn)化為 .record 文件饿凛,為使用 tf.contrib.slim 模塊訓(xùn)練模型做準備纠吴。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 216,843評論 6 502
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件遵班,死亡現(xiàn)場離奇詭異,居然都是意外死亡,警方通過查閱死者的電腦和手機,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,538評論 3 392
  • 文/潘曉璐 我一進店門宝与,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人,你說我怎么就攤上這事。” “怎么了?”我有些...
    開封第一講書人閱讀 163,187評論 0 353
  • 文/不壞的土叔 我叫張陵票顾,是天一觀的道長诱告。 經(jīng)常有香客問我靴姿,道長垂攘,這世上最難降的妖魔是什么津滞? 我笑而不...
    開封第一講書人閱讀 58,264評論 1 292
  • 正文 為了忘掉前任,我火速辦了婚禮发皿,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘。我一直安慰自己,他們只是感情好,可當我...
    茶點故事閱讀 67,289評論 6 390
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著,像睡著了一般。 火紅的嫁衣襯著肌膚如雪瓢阴。 梳的紋絲不亂的頭發(fā)上叠穆,一...
    開封第一講書人閱讀 51,231評論 1 299
  • 那天,我揣著相機與錄音起暮,去河邊找鬼密似。 笑死蟆盹,一個胖子當著我的面吹牛寨昙,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播,決...
    沈念sama閱讀 40,116評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼芙委,長吁一口氣:“原來是場噩夢啊……” “哼侧啼!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 38,945評論 0 275
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后谈截,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 45,367評論 1 313
  • 正文 獨居荒郊野嶺守林人離奇死亡纠拔,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,581評論 2 333
  • 正文 我和宋清朗相戀三年劝萤,在試婚紗的時候發(fā)現(xiàn)自己被綠了。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 39,754評論 1 348
  • 序言:一個原本活蹦亂跳的男人離奇死亡它碎,死狀恐怖金拒,靈堂內(nèi)的尸體忽然破棺而出幢码,到底是詐尸還是另有隱情底洗,我是刑警寧澤,帶...
    沈念sama閱讀 35,458評論 5 344
  • 正文 年R本政府宣布用含,位于F島的核電站痪寻,受9級特大地震影響披摄,放射性物質(zhì)發(fā)生泄漏灌砖。R本人自食惡果不足惜撩幽,卻給世界環(huán)境...
    茶點故事閱讀 41,068評論 3 327
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望静汤。 院中可真熱鬧撒妈,春花似錦狰右、人聲如沸谷暮。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,692評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽稳吮。三九已至,卻和暖如春井濒,著一層夾襖步出監(jiān)牢的瞬間灶似,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 32,842評論 1 269
  • 我被黑心中介騙來泰國打工瑞你, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留酪惭,地道東北人。 一個月前我還...
    沈念sama閱讀 47,797評論 2 369
  • 正文 我出身青樓者甲,卻偏偏與公主長得像春感,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子虏缸,可洞房花燭夜當晚...
    茶點故事閱讀 44,654評論 2 354

推薦閱讀更多精彩內(nèi)容