摘要
VGG 網(wǎng)絡(luò)在ILSVRC2014挑戰(zhàn)賽上取得了定位第一,分類第二的成績朦前,作者來自牛津大學(xué)的視覺幾何組( Visual Geometry Group介杆,估計是VGG命名的來源)鹃操。其主要貢獻在于主要探討了深度對于網(wǎng)絡(luò)的重要性,利用小的尺寸核代替大的卷積核春哨,然后把網(wǎng)絡(luò)做深荆隘;分別建立了16層,19層的深度網(wǎng)絡(luò)(即VGG16赴背,VGG19)椰拒。目前在分類,檢測凰荚,關(guān)鍵點定位中得到了非常廣泛的應(yīng)用燃观,目標(biāo)檢測算法如YOLO,SSD便瑟,S3FD缆毁;人臉關(guān)鍵點定位算法如DAN 等都采用VGG16作為特征提取網(wǎng)絡(luò)。
網(wǎng)絡(luò)性能
網(wǎng)絡(luò)由來
-
感受野
感受野(receptive field)指的是胳徽,在卷積神經(jīng)網(wǎng)絡(luò)CNN中积锅,決定某一層輸出結(jié)果中一個元素所對應(yīng)的輸入層的區(qū)域大小。比如养盗,一個7x7的圖像卷積層,該層輸出的特征圖的每一個元素對應(yīng)該層輸入的7x7區(qū)域适篙,這個區(qū)域即為感受野往核。另外,感受野是相互累計的嚷节,即卷積神經(jīng)網(wǎng)絡(luò)中每一層的感受野都是相對于第一層輸入而言的聂儒,因此計算中需注意:
(1)第一層卷積層的輸出特征圖像素的感受野的大小等于濾波器的大小
×蛱怠(2)深層卷積層的感受野大小和它之前所有層的濾波器大小和步長有關(guān)系衩婚。
詳細的感受野計算參見文章末尾的參考鏈接。
-
小尺寸vs 大尺寸卷積核與感受野
AlexNet最開始的7x7的卷積核的感受野是:7x7效斑。而通過上文的感受野計算公式非春,對于一個卷積層,在步長相同的情況下缓屠,2個3x3卷積核的感受野與1個5x5卷積核的感受野一致奇昙,3個3x3卷積核的感受野與1個7x7卷積核的感受野一致,而在參數(shù)量上敌完,3層3x3卷積的參數(shù)量要少于1層7x7卷積(假設(shè)輸入輸出通道數(shù)都為C储耐,則參數(shù)量為3x(3x3xCxC)=27CxC v.s. 1x(7x7xCxC)=49CxC)。具體地滨溉,VGG前3層卷積的感受野分別為:
第一個卷積核的感受野:3x3
第二個卷積核的感受野:(3-1)x 1+3=5
第三個卷積核的感受野:(5-1)x 1+3=7
可見三個3x3卷積核和一個7x7卷積核的感受野是一樣的什湘,但是3*3卷積核可以把網(wǎng)絡(luò)做的更深长赞。VGGNet不好的一點是它耗費更多計算資源,并且使用了更多的參數(shù)闽撤,導(dǎo)致更多的內(nèi)存占用涧卵。
網(wǎng)絡(luò)結(jié)構(gòu)
VGG的網(wǎng)絡(luò)結(jié)構(gòu)非常簡單,采用conv2d-relu-BatchNorm作為基礎(chǔ)單元腹尖,若干(2/3/4)個 這樣的基礎(chǔ)單元形成一組(vgg_block)柳恐,每組后連接一個2x2的maxpool進行降采樣;若干個組形成不同深度的VGG網(wǎng)絡(luò)热幔。
為了減少深度網(wǎng)絡(luò)的參數(shù)量乐设,整個網(wǎng)絡(luò)中一律采用kernel=3x3,stride=1
的卷積
在分類問題上绎巨,最后一組經(jīng)過maxpool
后經(jīng)過flatten近尚,dropout,3xfull connect,softmax
后輸出類別和置信概率场勤。最初的Imagenet預(yù)訓(xùn)練模型是在caffe框架下訓(xùn)練得到的戈锻,后來tensorflow,mxnet和媳,pytorch等都形成了各自的VGG預(yù)訓(xùn)練模型格遭,可以直接將重要的單元結(jié)構(gòu)的預(yù)訓(xùn)練數(shù)據(jù)進行遷移并在此基礎(chǔ)上做微調(diào)即可用在新的視覺任務(wù)中。VGG16和VGG19都包含了5組vgg_block留瞳,只是每組vgg_block中的基礎(chǔ)單元數(shù)不同拒迅,VGG16是2+2+3+3+3,VGG19是2+2+4+4+4她倘, 再加上后面的3個full connection layer
, 一共是16和19璧微。本文重點介紹VGG16,其詳細結(jié)構(gòu)如下:
--------------------------------------------------
layer | kh x kw, out, s | out size
--------------------------------------------------
input image (224 x 224 x3)
--------------------------------------------------
conv1_1 | 3x3, 64, 1 | 224x224x64
conv1_2 | 3x3, 64, 1 | 224x224x64
--------------------------------------------------
max_pool | 2x2, 64,2 | 112x112x64
--------------------------------------------------
conv2_1 | 3x3, 128, 1 | 112x112x128
conv2_2 | 3x3, 128, 1 | 112x112x128
--------------------------------------------------
max_pool | 2x2, 2 | 56x56x128
--------------------------------------------------
conv3_1 | 3x3, 256, 1 | 56x56x256
conv3_2 | 3x3, 256, 1 | 56x56x256
conv3_3 | 3x3, 256, 1 | 56x56x256
--------------------------------------------------
max_pool | 2x2, 256,2 | 28x28x256
--------------------------------------------------
conv4_1 | 3x3, 512, 1 | 28x28x512
conv4_2 | 3x3, 512, 1 | 28x28x512
conv4_3 | 3x3, 512, 1 | 28x28x512
--------------------------------------------------
max_pool | 2x2, 512,2 | 14x14x512
--------------------------------------------------
conv5_1 | 3x3, 512, 1 | 14x14x512
conv5_2 | 3x3, 512, 1 | 14x14x512
conv5_3 | 3x3, 512, 1 | 14x14x512
--------------------------------------------------
max_pool | 2x2, 512,2 | 7x7x512
--------------------------------------------------
fc6 | 4096 | 1x1x4096
fc7 | 4096 | 1x1x4096
fc8 | 1000 | 1x1x1000
Softmax | Classifier | 1x1x1000
--------------------------------------------------
代碼實現(xiàn)
tensorflow(以下簡稱tf)中構(gòu)建cnn網(wǎng)絡(luò)的python API主要有3種:
- tf.nn
- tf.layers
- tf.contrib.layers
封裝程度逐個遞進硬梁,其中tf.nn
定義卷積層等是最為復(fù)雜的前硫,tf.contrib.layers
相對簡單方便的多。值得一提的是荧止,tf.contrib 模塊使用起來能夠結(jié)合python的高級語法特性屹电,使得定義網(wǎng)絡(luò)結(jié)構(gòu)的代碼可以得到很大程度的簡化,更加可讀并且pythonic罩息,其中的tf.contrib.slim(TF-Slim)
也因如此而廣受用戶歡迎嗤详,目前很多成熟的網(wǎng)絡(luò)結(jié)構(gòu)都是基于該模塊實現(xiàn)的。因為在tensorflow官網(wǎng)的解釋比較詳細瓷炮。鑒于不同用戶采用的tf API模塊不同葱色,其實現(xiàn)的VGG代碼也不相同,此處做統(tǒng)一整理和比較娘香,以供參考苍狰。
方式0
tf.nn神經(jīng)網(wǎng)絡(luò)模塊是tensorflow用于深度學(xué)習(xí)計算的核心模塊办龄,包括conv2d, pool, relu等為首的卷積, 池化,激活等各種操作(opterator)淋昭。以圖像2d卷積為例俐填,tf.nn.conv2d
接受一個4D的input([batch, h, w, channel])
和一個4D的kernel(kh, kw, in_channel, out_channel)
以及stride(int or list)
執(zhí)行2d卷積操作,其中卷積核kernel
以及偏置bias
都需要事先根據(jù)shape
構(gòu)造翔忽,通常的做法是通過python自己先進行封裝英融,定義一個能通過指定kw,kh,out_channel
等參數(shù)來自動構(gòu)造卷積層的函數(shù)(如代碼中的conv_op
)。全連接層歇式,最大池化也通過類似的方法進行構(gòu)造驶悟。詳細代碼如下:
# --------------------------Method 0 --------------------------------------------
import tensorflow as tf
# 用來創(chuàng)建卷積層并把本層的參數(shù)存入?yún)?shù)列表
def conv_op(input_op, name, kh, kw, n_out, dh, dw, p):
"""
define conv operator with tf.nn
:param input_op: 輸入的tensor
:param name: 該層的名稱
:param kh: 卷積層的高
:param kw: 卷積層的寬
:param n_out: 輸出通道數(shù)
:param dh: 步長的高
:param dw: 步長的寬
:param p: 參數(shù)列表
:return:
"""
# 輸入的通道數(shù)
n_in = input_op.get_shape()[-1].value
with tf.name_scope(name) as scope:
kernel = tf.get_variable(scope + "w", shape=[kh, kw, n_in, n_out], dtype=tf.float32,
initializer=tf.contrib.layers.xavier_initializer_conv2d())
conv = tf.nn.conv2d(input_op, kernel, (1, dh, dw, 1), padding='SAME')
bias_init_val = tf.constant(0.0, shape=[n_out], dtype=tf.float32)
biases = tf.Variable(bias_init_val, trainable=True, name='b')
z = tf.nn.bias_add(conv, biases)
activation = tf.nn.relu(z, name=scope)
p += [kernel, biases]
return activation
# 定義全連接層
def fc_op(input_op, name, n_out, p):
"""
define full connect opterator with tf.nn
:param input_op: 輸入的tensor
:param name: 該層的名稱
:param n_out: 輸出通道數(shù)
:param p: 參數(shù)列表
:return:
"""
n_in = input_op.get_shape()[-1].value
with tf.name_scope(name) as scope:
kernel = tf.get_variable(scope + 'w', shape=[n_in, n_out], dtype=tf.float32,
initializer=tf.contrib.layers.xavier_initializer_conv2d())
biases = tf.Variable(tf.constant(0.1, shape=[n_out], dtype=tf.float32), name='b')
# tf.nn.relu_layer()用來對輸入變量input_op與kernel做乘法并且加上偏置b
activation = tf.nn.relu_layer(input_op, kernel, biases, name=scope)
p += [kernel, biases]
return activation
# 定義最大池化層
def mpool_op(input_op, name, kh, kw, dh, dw):
return tf.nn.max_pool(input_op, ksize=[1, kh, kw, 1], strides=[1, dh, dw, 1], padding='SAME', name=name)
# 定義網(wǎng)絡(luò)結(jié)構(gòu) Method 0
def vgg16_op(input_op, keep_prob):
p = []
conv1_1 = conv_op(input_op, name='conv1_1', kh=3, kw=3, n_out=64, dh=1, dw=1, p=p)
conv1_2 = conv_op(conv1_1, name='conv1_2', kh=3, kw=3, n_out=64, dh=1, dw=1, p=p)
pool1 = mpool_op(conv1_2, name='pool1', kh=2, kw=2, dw=2, dh=2)
conv2_1 = conv_op(pool1, name='conv2_1', kh=3, kw=3, n_out=128, dh=1, dw=1, p=p)
conv2_2 = conv_op(conv2_1, name='conv2_2', kh=3, kw=3, n_out=128, dh=1, dw=1, p=p)
pool2 = mpool_op(conv2_2, name='pool2', kh=2, kw=2, dw=2, dh=2)
conv3_1 = conv_op(pool2, name='conv3_1', kh=3, kw=3, n_out=256, dh=1, dw=1, p=p)
conv3_2 = conv_op(conv3_1, name='conv3_2', kh=3, kw=3, n_out=256, dh=1, dw=1, p=p)
conv3_3 = conv_op(conv3_2, name='conv3_3', kh=3, kw=3, n_out=256, dh=1, dw=1, p=p)
pool3 = mpool_op(conv3_3, name='pool3', kh=2, kw=2, dw=2, dh=2)
conv4_1 = conv_op(pool3, name='conv4_1', kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)
conv4_2 = conv_op(conv4_1, name='conv4_2', kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)
conv4_3 = conv_op(conv4_2, name='conv4_3', kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)
pool4 = mpool_op(conv4_3, name='pool4', kh=2, kw=2, dw=2, dh=2)
conv5_1 = conv_op(pool4, name='conv5_1', kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)
conv5_2 = conv_op(conv5_1, name='conv5_2', kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)
conv5_3 = conv_op(conv5_2, name='conv5_3', kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)
pool5 = mpool_op(conv5_3, name='pool5', kh=2, kw=2, dw=2, dh=2)
shp = pool5.get_shape()
print("pool5 shape ", shp)
flattened_shape = shp[1].value * shp[2].value * shp[3].value
resh1 = tf.reshape(pool5, [-1, flattened_shape], name="resh1")
fc6 = fc_op(resh1, name="fc6", n_out=4096, p=p)
fc6_drop = tf.nn.dropout(fc6, keep_prob, name='fc6_drop')
fc7 = fc_op(fc6_drop, name="fc7", n_out=4096, p=p)
fc7_drop = tf.nn.dropout(fc7, keep_prob, name="fc7_drop")
fc8 = fc_op(fc7_drop, name="fc8", n_out=1000, p=p)
softmax = tf.nn.softmax(fc8)
predictions = tf.argmax(softmax, 1)
return predictions, softmax, fc8, p
- 方式1
tf.layers模塊屬于TensorFlow的一個穩(wěn)定的中層API,算是tf.nn模塊的抽象材失,封裝了Conv2D, Dense痕鳍,BatchNormalization,Conv2DTranspose等類和conv2d等函數(shù)龙巨,極大地加快了模型的構(gòu)建速度笼呆。如卷積層的構(gòu)建可以使用conv = tf.layers.conv2d(x, filters=32, kernel_size=3, padding="same", strides=1, activation=tf.nn.relu)
一行代碼實現(xiàn),同時還可以直接指定卷積后激活的函數(shù)旨别∈模基于該模塊的VGG16代碼實現(xiàn)如下:
# --------------------------Method 1 --------------------------------------------
import tensorflow as tf
class VGG1:
"""
define with tf.layers
"""
def __init__(self, resolution_inp=224, channel=3, name='vgg'):
"""
construct function
:param resolution_inp: int, size of input image. default 224 of ImageNet
:param channel: int, channel of input image. 1 or 3
:param name:
"""
self.name = name
self.channel = channel
self.resolution_inp = resolution_inp
def __call__(self, x, dropout=0.5, is_training=True):
with tf.variable_scope(self.name) as scope:
size = 64
se = self.vgg_block(x, 2, size, is_training=is_training)
se = self.vgg_block(se, 2, size * 2, is_training=is_training)
se = self.vgg_block(se, 3, size * 4, is_training=is_training)
se = self.vgg_block(se, 3, size * 8, is_training=is_training)
se = self.vgg_block(se, 3, size * 8, is_training=is_training)
flatten = tcl.flatten(se)
fc6 = tf.layers.dense(flatten, 4096)
fc6_drop = tcl.dropout(fc6, dropout, is_training=is_training)
fc7 = tf.layers.dense(fc6_drop, 4096)
fc7_drop = tcl.dropout(fc7, dropout, is_training=is_training)
self.fc_out = tf.layers.dense(fc7_drop, 1000)
# predict for classify
softmax = tf.nn.softmax(self.fc_out)
self.predictions = tf.argmax(softmax, 1)
return self.predictions
def vgg_block(self, x, num_convs, num_channels, scope=None, is_training=True):
"""
define the basic repeat unit in vgg: n x (conv-relu-batchnorm)-maxpool
:param x: tensor or numpy.array, input
:param num_convs: int, number of conv-relu-batchnorm
:param num_channels: int, number of conv filters
:param scope: name space or scope
:param is_training: bool, is training or not
:return:
"""
with tf.variable_scope(scope, "conv"):
se = x
# conv-relu-batchnorm group
for i in range(num_convs):
se = tf.layers.conv2d(se,
filters=num_channels,
kernel_size=3,
padding="same",
strides=1,
activation=tf.nn.relu)
se = tf.layers.batch_normalization(se,
training=is_training,
scale=True)
se = tf.layers.max_pooling2d(se, 2, 2, padding="same")
return se
@property
def trainable_vars(self):
return [var for var in tf.trainable_variables() if self.name in var.name]
- 方式2
tf.contrib.layers 是tf.layers的進一步封裝,如在tf.contrib.layers.conv2d
中增加了batch_norm
的參數(shù)昼榛。而tf.contrib 的framework實現(xiàn)了很多pythonic的操作境肾,如arg_scope的上下文管理,可以對不同卷積層進行相同的參數(shù)設(shè)置(如激活類型胆屿,batch_norm等),使得代碼更加簡潔優(yōu)美偶宫。當(dāng)然還有一個比較火的slim模塊非迹,在此基礎(chǔ)上又增加了些新的特性,整體用法基本類似纯趋,此處不再贅述憎兽。但由于很多模塊不是tf原生支持,在即將發(fā)布的tensorflow2.0聲明中明確指出該模塊下的眾多模塊可能被移到其他模塊或被棄用吵冒,屆時此處代碼可能不再合適纯命,在此聲明”云埽基于該模塊的vgg16的實現(xiàn)如下:
# --------------------------Method 2 --------------------------------------------
import tensorflow.contrib.layers as tcl
from tensorflow.contrib.framework import arg_scope
class VGG2:
"""
define with tf.contrib.layers
"""
def __init__(self, resolution_inp=224, channel=3, name='vgg'):
self.name = name
self.channel = channel
self.resolution_inp = resolution_inp
def __call__(self, x, dropout=0.5, is_training=True):
with tf.variable_scope(self.name) as scope:
with arg_scope([tcl.batch_norm], is_training=is_training, scale=True):
with arg_scope([tcl.conv2d],
padding="SAME",
normalizer_fn=tcl.batch_norm,
activation_fn=tf.nn.relu, ):
size = 64
se = self.vgg_block(x, 2, size, is_training=is_training)
se = self.vgg_block(se, 2, size * 2, is_training=is_training)
se = self.vgg_block(se, 3, size * 4, is_training=is_training)
se = self.vgg_block(se, 3, size * 8, is_training=is_training)
se = self.vgg_block(se, 3, size * 8, is_training=is_training)
flatten = tcl.flatten(se)
fc6 = tf.layers.dense(flatten, 4096)
fc6_drop = tcl.dropout(fc6, dropout, is_training=is_training)
print("dropout ", fc6, fc6_drop)
fc7 = tf.layers.dense(fc6_drop, 4096)
fc7_drop = tcl.dropout(fc7, dropout, is_training=is_training)
self.fc_out = tf.layers.dense(fc7_drop, 1000)
# predict for classify
softmax = tf.nn.softmax(self.fc_out)
self.predictions = tf.argmax(softmax, 1)
return self.predictions
def vgg_block(self, x, num_convs, num_channels, scope=None, is_training=True):
"""
define the basic repeat unit in vgg: n x (conv-relu-batchnorm)-maxpool
:param x: tensor or numpy.array, input
:param num_convs: int, number of conv-relu-batchnorm
:param num_channels: int, number of conv filters
:param scope: name space or scope
:param is_training: bool, is training or not
:return:
"""
with tf.variable_scope(scope, "conv"):
se = x
for i in range(num_convs):
se = tcl.conv2d(se, num_outputs=num_channels, kernel_size=3, stride=1)
se = tf.layers.max_pooling2d(se, 2, 2, padding="same")
print("layer ", self.name, "in ", x, "out ", se)
return se
@property
def trainable_vars(self):
return [var for var in tf.trainable_variables() if self.name in var.name]
@property
def vars(self):
return [var for var in tf.global_variables() if self.name in var.name]```
運行
該部分代碼包含2部分:計時函數(shù)time_tensorflow_run
接受一個tf.Session
變量和待計算的tensor
以及相應(yīng)的參數(shù)字典和打印信息, 統(tǒng)計執(zhí)行該tensor
100次所需要的時間(平均值和方差)亿汞;主函數(shù) run_benchmark中初始化了vgg16的3種調(diào)用方式,分別統(tǒng)計3中網(wǎng)絡(luò)在推理(predict) 和梯度計算(后向傳遞)的時間消耗揪阿,詳細代碼如下:
# -------------------------- Demo and Test -------------------------------------------
from datetime import datetime
import tensorflow as tf
import math
import time
batch_size = 16
num_batches = 100
def time_tensorflow_run(session, target, feed, info_string):
"""
calculate time for each session run
:param session: tf.Session
:param target: opterator or tensor need to run with session
:param feed: feed dict for session
:param info_string: info message for print
:return:
"""
num_steps_burn_in = 10 # 預(yù)熱輪數(shù)
total_duration = 0.0 # 總時間
total_duration_squared = 0.0 # 總時間的平方和用以計算方差
for i in range(num_batches + num_steps_burn_in):
start_time = time.time()
_ = session.run(target, feed_dict=feed)
duration = time.time() - start_time
if i >= num_steps_burn_in: # 只考慮預(yù)熱輪數(shù)之后的時間
if not i % 10:
print('[%s] step %d, duration = %.3f' % (datetime.now(), i - num_steps_burn_in, duration))
total_duration += duration
total_duration_squared += duration * duration
mn = total_duration / num_batches # 平均每個batch的時間
vr = total_duration_squared / num_batches - mn * mn # 方差
sd = math.sqrt(vr) # 標(biāo)準(zhǔn)差
print('[%s] %s across %d steps, %.3f +/- %.3f sec/batch' % (datetime.now(), info_string, num_batches, mn, sd))
# test demo
def run_benchmark():
"""
main function for test or demo
:return:
"""
with tf.Graph().as_default():
image_size = 224 # 輸入圖像尺寸
images = tf.Variable(tf.random_normal([batch_size, image_size, image_size, 3], dtype=tf.float32, stddev=1e-1))
keep_prob = tf.placeholder(tf.float32)
# method 0
# prediction, softmax, fc8, p = vgg16_op(images, keep_prob)
# method 1 and method 2
# vgg16 = VGG1(resolution_inp=image_size, name="vgg16")
vgg16 = VGG2(resolution_inp=image_size, name="vgg16")
prediction = vgg16(images, 0.5, True)
fc8 = vgg16.fc_out
p = vgg16.trainable_vars
for v in p:
print(v)
init = tf.global_variables_initializer()
# for var in tf.global_variables():
# print("param ", var.name)
sess = tf.Session()
print("init...")
sess.run(init)
print("predict..")
writer = tf.summary.FileWriter("./logs")
writer.add_graph(sess.graph)
time_tensorflow_run(sess, prediction, {keep_prob: 1.0}, "Forward")
# 用以模擬訓(xùn)練的過程
objective = tf.nn.l2_loss(fc8) # 給一個loss
grad = tf.gradients(objective, p) # 相對于loss的 所有模型參數(shù)的梯度
print('grad backword')
time_tensorflow_run(sess, grad, {keep_prob: 0.5}, "Forward-backward")
writer.close()
if __name__ == '__main__':
run_benchmark()
注: 完整代碼可參見個人github工程
參數(shù)量
總共參數(shù)數(shù)量大約138M左右
全連接層參數(shù)量:
時間效率
參考
項目主頁
https://blog.csdn.net/wcy12341189/article/details/56281618
https://blog.csdn.net/App_12062011/article/details/60962978
https://blog.csdn.net/zhangwei15hh/article/details/78417789
感受野
感受野計算