MobileNet是Google提出來(lái)的移動(dòng)端分類網(wǎng)絡(luò)慈格。在V1中,MobileNet應(yīng)用了深度可分離卷積(Depth-wise Seperable Convolution)并提出兩個(gè)超參來(lái)控制網(wǎng)絡(luò)容量筷凤,這種卷積背后的假設(shè)是跨channel相關(guān)性和跨spatial相關(guān)性的解耦鹦马。深度可分離卷積能夠節(jié)省參數(shù)量省雏婶,在保持移動(dòng)端可接受的模型復(fù)雜性的基礎(chǔ)上達(dá)到了相當(dāng)?shù)母呔取6赩2中,MobileNet應(yīng)用了新的單元:Inverted residual with linear bottleneck较坛,主要的改動(dòng)是為Bottleneck添加了linear激活輸出以及將殘差網(wǎng)絡(luò)的skip-connection結(jié)構(gòu)轉(zhuǎn)移到低維Bottleneck層印蔗。
Paper:Inverted Residuals and Linear Bottlenecks Mobile Networks for Classification, Detection and Segmentation
Github:https://github.com/xiaochus/MobileNetV2
網(wǎng)絡(luò)結(jié)構(gòu)
MobileNetV2的整體結(jié)構(gòu)如下圖所示。每行描述一個(gè)或多個(gè)相同(步長(zhǎng))層的序列丑勤,每個(gè)bottleneck重復(fù)n次喻鳄。 相同序列中的所有層具有相同數(shù)量的輸出通道。 每個(gè)序列的第一層有使用步長(zhǎng)s确封,所有其他層使用步長(zhǎng)1除呵。所有的空間卷積使用3 * 3的內(nèi)核。擴(kuò)展因子t始終應(yīng)用于輸入大小爪喘。假設(shè)輸入某一層的tensor的通道數(shù)為k颜曾,那么應(yīng)用在這一層上的filters數(shù)就為 k * t。
Bottleneck的結(jié)構(gòu)如下所示秉剑,根據(jù)使用的步長(zhǎng)大小來(lái)決定是否使用skip-connection結(jié)構(gòu)泛豪。
環(huán)境
- OpenCV 3.4
- Python 3.5
- Tensorflow-gpu 1.2.0
- Keras 2.1.3
實(shí)現(xiàn)
基于論文給出的參數(shù),我使用Keras 2實(shí)現(xiàn)了網(wǎng)絡(luò)結(jié)構(gòu)侦鹏,如下所示:
from keras.models import Model
from keras.layers import Input, Conv2D, GlobalAveragePooling2D, Dropout
from keras.layers import Activation, BatchNormalization, add, Reshape
from keras.applications.mobilenet import relu6, DepthwiseConv2D
from keras.utils.vis_utils import plot_model
from keras import backend as K
def _conv_block(inputs, filters, kernel, strides):
"""Convolution Block
This function defines a 2D convolution operation with BN and relu6.
# Arguments
inputs: Tensor, input tensor of conv layer.
filters: Integer, the dimensionality of the output space.
kernel: An integer or tuple/list of 2 integers, specifying the
width and height of the 2D convolution window.
strides: An integer or tuple/list of 2 integers,
specifying the strides of the convolution along the width and height.
Can be a single integer to specify the same value for
all spatial dimensions.
# Returns
Output tensor.
"""
channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
x = Conv2D(filters, kernel, padding='same', strides=strides)(inputs)
x = BatchNormalization(axis=channel_axis)(x)
return Activation(relu6)(x)
def _bottleneck(inputs, filters, kernel, t, s, r=False):
"""Bottleneck
This function defines a basic bottleneck structure.
# Arguments
inputs: Tensor, input tensor of conv layer.
filters: Integer, the dimensionality of the output space.
kernel: An integer or tuple/list of 2 integers, specifying the
width and height of the 2D convolution window.
t: Integer, expansion factor.
t is always applied to the input size.
s: An integer or tuple/list of 2 integers,specifying the strides
of the convolution along the width and height.Can be a single
integer to specify the same value for all spatial dimensions.
r: Boolean, Whether to use the residuals.
# Returns
Output tensor.
"""
channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
tchannel = K.int_shape(inputs)[channel_axis] * t
x = _conv_block(inputs, tchannel, (1, 1), (1, 1))
x = DepthwiseConv2D(kernel, strides=(s, s), depth_multiplier=1, padding='same')(x)
x = BatchNormalization(axis=channel_axis)(x)
x = Activation(relu6)(x)
x = Conv2D(filters, (1, 1), strides=(1, 1), padding='same')(x)
x = BatchNormalization(axis=channel_axis)(x)
if r:
x = add([x, inputs])
return x
def _inverted_residual_block(inputs, filters, kernel, t, strides, n):
"""Inverted Residual Block
This function defines a sequence of 1 or more identical layers.
# Arguments
inputs: Tensor, input tensor of conv layer.
filters: Integer, the dimensionality of the output space.
kernel: An integer or tuple/list of 2 integers, specifying the
width and height of the 2D convolution window.
t: Integer, expansion factor.
t is always applied to the input size.
s: An integer or tuple/list of 2 integers,specifying the strides
of the convolution along the width and height.Can be a single
integer to specify the same value for all spatial dimensions.
n: Integer, layer repeat times.
# Returns
Output tensor.
"""
x = _bottleneck(inputs, filters, kernel, t, strides)
for i in range(1, n):
x = _bottleneck(x, filters, kernel, t, 1, True)
return x
def MobileNetv2(input_shape, k):
"""MobileNetv2
This function defines a MobileNetv2 architectures.
# Arguments
input_shape: An integer or tuple/list of 3 integers, shape
of input tensor.
k: Integer, layer repeat times.
# Returns
MobileNetv2 model.
"""
inputs = Input(shape=input_shape)
x = _conv_block(inputs, 32, (3, 3), strides=(2, 2))
x = _inverted_residual_block(x, 16, (3, 3), t=1, strides=1, n=1)
x = _inverted_residual_block(x, 24, (3, 3), t=6, strides=2, n=2)
x = _inverted_residual_block(x, 32, (3, 3), t=6, strides=2, n=3)
x = _inverted_residual_block(x, 64, (3, 3), t=6, strides=2, n=4)
x = _inverted_residual_block(x, 96, (3, 3), t=6, strides=1, n=3)
x = _inverted_residual_block(x, 160, (3, 3), t=6, strides=2, n=3)
x = _inverted_residual_block(x, 320, (3, 3), t=6, strides=1, n=1)
x = _conv_block(x, 1280, (1, 1), strides=(1, 1))
x = GlobalAveragePooling2D()(x)
x = Reshape((1, 1, 1280))(x)
x = Dropout(0.3, name='Dropout')(x)
x = Conv2D(k, (1, 1), padding='same')(x)
x = Activation('softmax', name='softmax')(x)
output = Reshape((k,))(x)
model = Model(inputs, output)
plot_model(model, to_file='images/MobileNetv2.png', show_shapes=True)
return model
if __name__ == '__main__':
MobileNetv2((224, 224, 3), 1000)
訓(xùn)練
論文中推薦的輸入大小為 224 * 224诡曙,因此訓(xùn)練集最好使用同樣的大小. data\convert.py
文件提供了將cifar-100數(shù)據(jù)放大為224的例子.
訓(xùn)練數(shù)據(jù)集應(yīng)該按照以下的格式配置:
| - data/
| - train/
| - class 0/
| - image.jpg
....
| - class 1/
....
| - class n/
| - validation/
| - class 0/
| - class 1/
....
| - class n/
運(yùn)行下面的命令來(lái)訓(xùn)練模型:
python train.py --classes num_classes --batch batch_size --epochs epochs --size image_size
訓(xùn)練好的 .h5
權(quán)重文件保存在model文件夾.。如果想要在已有的模型上進(jìn)行微調(diào)略水,可以使用下面的命令价卤。但是需要注意,只能夠改變最后一層輸出的類別的個(gè)數(shù)渊涝,其他層的結(jié)構(gòu)應(yīng)該保持一致慎璧。
python train.py --classes num_classes --batch batch_size --epochs epochs --size image_size --weights weights_path --tclasses pre_classes
參數(shù)
- --classes, 當(dāng)前訓(xùn)練集的類別數(shù)。
- --size, 圖像大小跨释。
- --batch, batch size胸私。
- --epochs, epochs。
- --weights, 需要fine tune的模型鳖谈。
- --tclasses, 訓(xùn)練好的模型中輸出的類別數(shù)岁疼。
實(shí)驗(yàn)
由于條件限制,我們使用cifar-100數(shù)據(jù)庫(kù)缆娃,在一定大小的epochs下進(jìn)行實(shí)驗(yàn)捷绒。
device: Tesla K80
dataset: cifar-100
optimizer: Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
batch_szie: 128
實(shí)驗(yàn)細(xì)節(jié)如下,盡管網(wǎng)絡(luò)沒有完全收斂龄恋,但依然取得了不錯(cuò)的準(zhǔn)確率疙驾。
Metrics | Loss | Top-1 Accuracy | Top-5 Accuracy |
---|---|---|---|
cifar-100 | 0.195 | 94.42% | 99.82% |