Mobilenet-Efficient Convolutional Neural Networks for Mobile Vision Applications

摘要

MobileNet網(wǎng)絡(luò)一種針對(duì)移動(dòng)端以及嵌入式視覺應(yīng)用的輕量網(wǎng)絡(luò)結(jié)構(gòu)险污，作者來自Google猪瞬。其貢獻(xiàn)在于使用深度可分離卷積和1x1卷積代替?zhèn)鹘y(tǒng)的2d圖像卷積憎瘸，來構(gòu)造輕型權(quán)重深度神經(jīng)網(wǎng)絡(luò)。在資源和準(zhǔn)確率的權(quán)衡方面做了大量的實(shí)驗(yàn)并且相較于其他在ImageNet分類任務(wù)上著名的模型有很好的表現(xiàn)陈瘦。

網(wǎng)絡(luò)由來

網(wǎng)絡(luò)小型化方法：
（1）卷積核分解幌甘，使用1×N和N×1的卷積核代替N×N的卷積核
（2）使用bottleneck結(jié)構(gòu)，以SqueezeNet為代表
（3）以低精度浮點(diǎn)數(shù)保存，例如Deep Compression
（4）冗余卷積核剪枝及哈弗曼編碼
傳統(tǒng)3D圖像卷積
傳統(tǒng)的3D圖像卷積指的是一個(gè)多通道的圖像(假設(shè)圖像通道數(shù)為M锅风，C>1) 和N個(gè)KxK的卷積核（這N個(gè)卷積核互不相同）做卷積酥诽，即深度神經(jīng)網(wǎng)絡(luò)中的卷積層所作的事情。卷積的流程是：
（1）對(duì)于其中一個(gè)卷積核皱埠，分別與圖像的每個(gè)通道做2d圖像卷積肮帐，M個(gè)通道得到M個(gè)卷積結(jié)果；
（2）對(duì)這M個(gè)卷積結(jié)果按元素求和边器，得到一張求和結(jié)果训枢；
（3）對(duì)N個(gè)卷積核中的每個(gè)核，重復(fù)步驟(1)-(2)忘巧，即得到一個(gè)通道為N的卷積結(jié)果恒界，即featuremap。
該過程的計(jì)算量為：((K x K x H x W) x M )x N

標(biāo)準(zhǔn)3D圖像卷積
深度可分離卷積
深度可分離卷積(depthwith conv) 實(shí)際上就是傳統(tǒng)3D卷積過程中的步驟(1)砚嘴，將1個(gè)卷積核分別與每個(gè)通道進(jìn)行卷積十酣，得到M個(gè)卷積結(jié)果

深度可分離卷積
1x1 卷積
1x1卷積最初來自于Network in Network 網(wǎng)絡(luò)，主要用于通道壓縮上际长，即改變特征圖的通道數(shù)耸采，后也用來代替全連接層以減少計(jì)算量。
對(duì)于一個(gè)通道數(shù)為M的圖像工育，N個(gè)1x1卷積的計(jì)算量為：((1 x 1 x H x W) x M) x N

image.png
mobilenet方案
對(duì)于一個(gè)標(biāo)準(zhǔn)的卷積操作洋幻，可用一個(gè)深度可分離卷積核一個(gè)1x1卷積替代，其計(jì)算量為：
(K x K x H x W) x M + ((1 x 1 x H x W) x M) x N = H x W x (K x K + N) x M

相比與標(biāo)準(zhǔn)的卷積操作翅娶，其計(jì)算量減少量為：
(H x W x M x (K x K + N) )/(H x W x K x K x M x N) = 1/N + 1/(K x K)

網(wǎng)絡(luò)結(jié)構(gòu)

Mobilenet的網(wǎng)絡(luò)結(jié)構(gòu)非常簡單文留，第一層采用3x3標(biāo)準(zhǔn)的卷積層(stride=2)，其后采用深度可分離卷積和1x1卷積(即conv_dw + conv1x1)作為基礎(chǔ)單元竭沫，若干個(gè) 這樣的基礎(chǔ)單元串聯(lián)起來形成不同深度的網(wǎng)絡(luò)燥翅，其中每個(gè)1x1卷積后都連接relu激活和batch norm，在conv_dw通過設(shè)置stride=2進(jìn)行下采樣蜕提。最后采用average pooling代替全連接森书，ImageNet分類任務(wù)上采用一層全連接(units=1000) 和softmax輸出類別和置信概率。

mobile net structs 
-------------------------------------------------- 
layer        | kh x kw, out, s | out size 
-------------------------------------------------- 
         input image (224 x 224 x3)
-------------------------------------------------- 
conv         | 3x3, 32, 2      | 112x112x32
-------------------------------------------------- 
conv_dw      | 3x3, 32dw, 1    | 112x112x32 
conv1x1      | 1x1, 64, 1      | 112x112x64
-------------------------------------------------- 
conv_dw      | 3x3, 64dw, 2    | 56x56x64 
conv1x1      | 1x1, 128, 1     | 56x56x128
-------------------------------------------------- 
conv_dw      | 3x3, 128dw, 1   | 56x56x128 
conv1x1      | 1x1, 128, 1     | 56x56x128 
-------------------------------------------------- 
conv_dw      | 3x3, 128dw, 2   | 28x28x128 
conv1x1      | 1x1, 256, 1     | 28x28x128
-------------------------------------------------- 
conv_dw      | 3x3, 256dw, 1   | 28x28x256 
conv1x1      | 1x1, 256, 1     | 28x28x256 
-------------------------------------------------- 
conv_dw      | 3x3, 256dw, 2   | 14x14x256 
conv1x1      | 1x1, 512, 1     | 14x14x512
-------------------------------------------------- 
5x 
conv_dw      | 3x3, 512dw, 1   | 14x14x512 
conv1x1      | 1x1, 512, 1     | 14x14x512 
-------------------------------------------------- 
conv_dw      | 3x3, 512dw, 2   | 7x7x512 
conv1x1      | 1x1, 1024, 1    | 7x7x1024 
-------------------------------------------------- 
conv_dw      | 3x3, 1024dw, 1  | 7x7x1024 
conv1x1      | 1x1, 1024, 1    | 7x7x1024 
-------------------------------------------------- 
Avg Pool      | 7x7, 1          | 1x1x1024 
FC           | 1024, 1000      | 1x1x1000 
Softmax      | Classifier      | 1x1x1000
--------------------------------------------------

代碼實(shí)現(xiàn)

本文采用tensorflow.contrib.layers 模塊來構(gòu)建Mobilenet網(wǎng)絡(luò)結(jié)構(gòu)谎势，關(guān)于tf.nn凛膏，tf.layers等api的構(gòu)建方式參見VGG網(wǎng)絡(luò)中的相關(guān)代碼。

# --------------------------Method 1 --------------------------------------------
import tensorflow.contrib.layers as tcl
from tensorflow.contrib.framework import arg_scope


class Mobilenet:
    def __init__(self, resolution_inp=224, channel=3, name='resnet50'):
        self.name = name
        self.channel = channel
        self.resolution_inp = resolution_inp

    def _depthwise_separable_conv(self, x, num_outputs, kernel_size=3, stride=1, scope=None):
        with tf.variable_scope(scope, "dw_blk"):
            dw_conv = tcl.separable_conv2d(x, num_outputs=None,
                                           kernel_size=kernel_size,
                                           stride=stride,
                                           depth_multiplier=1)
            conv_1x1 = tcl.conv2d(dw_conv, num_outputs=num_outputs, kernel_size=1, stride=1)
            return conv_1x1

    def __call__(self, x, dropout=0.5, is_training=True):
        with tf.variable_scope(self.name) as scope:
            with arg_scope([tcl.batch_norm], is_training=is_training, scale=True):
                with arg_scope([tcl.conv2d, tcl.separable_conv2d],
                               activation_fn=tf.nn.relu,
                               normalizer_fn=tcl.batch_norm,
                               padding="SAME"):
                    conv1 = tcl.conv2d(x, 32, kernel_size=3, stride=2)

                    y = self._depthwise_separable_conv(conv1, 64, 3, stride=1)
                    y = self._depthwise_separable_conv(y, 128, 3, stride=2)

                    y = self._depthwise_separable_conv(y, 128, 3, stride=1)
                    y = self._depthwise_separable_conv(y, 256, 3, stride=2)

                    y = self._depthwise_separable_conv(y, 256, 3, stride=1)
                    y = self._depthwise_separable_conv(y, 512, 3, stride=2)

                    y = self._depthwise_separable_conv(y, 512, 3, stride=1)
                    y = self._depthwise_separable_conv(y, 512, 3, stride=1)
                    y = self._depthwise_separable_conv(y, 512, 3, stride=1)
                    y = self._depthwise_separable_conv(y, 512, 3, stride=1)
                    y = self._depthwise_separable_conv(y, 512, 3, stride=1)

                    print("y", y)
                    y = self._depthwise_separable_conv(y, 512, 3, stride=2)
                    y = self._depthwise_separable_conv(y, 512, 3, stride=1)

                    avg_pool = tcl.avg_pool2d(y, 7, stride=1)
                    flatten = tf.layers.flatten(avg_pool)

                    self.fc6 = tf.layers.dense(flatten, units=1000, activation=tf.nn.relu)
                    # dropout = tf.nn.dropout(fc6, keep_prob=0.5)
                    predictions = tf.nn.softmax(self.fc6)

                    return predictions

運(yùn)行

該部分代碼包含2部分：計(jì)時(shí)函數(shù)time_tensorflow_run接受一個(gè)tf.Session變量和待計(jì)算的tensor以及相應(yīng)的參數(shù)字典和打印信息, 統(tǒng)計(jì)執(zhí)行該tensor100次所需要的時(shí)間(平均值和方差)脏榆；主函數(shù) run_benchmark中初始化了vgg16的3種調(diào)用方式猖毫，分別統(tǒng)計(jì)3中網(wǎng)絡(luò)在推理(predict) 和梯度計(jì)算(后向傳遞)的時(shí)間消耗，詳細(xì)代碼如下：

# -------------------------- Demo and Test --------------------------------------------
batch_size = 16
num_batches = 100
import time
import math
from datetime import datetime


def time_tensorflow_run(session, target, feed, info_string):
    """
    calculate time for each session run
    :param session: tf.Session
    :param target: opterator or tensor need to run with session
    :param feed: feed dict for session
    :param info_string: info message for print
    :return: 
    """
    num_steps_burn_in = 10  # 預(yù)熱輪數(shù)
    total_duration = 0.0  # 總時(shí)間
    total_duration_squared = 0.0  # 總時(shí)間的平方和用以計(jì)算方差
    for i in range(num_batches + num_steps_burn_in):
        start_time = time.time()
        _ = session.run(target, feed_dict=feed)

        duration = time.time() - start_time

        if i >= num_steps_burn_in:  # 只考慮預(yù)熱輪數(shù)之后的時(shí)間
            if not i % 10:
                print('[%s] step %d, duration = %.3f' % (datetime.now(), i - num_steps_burn_in, duration))
            total_duration += duration
            total_duration_squared += duration * duration

    mn = total_duration / num_batches  # 平均每個(gè)batch的時(shí)間
    vr = total_duration_squared / num_batches - mn * mn  # 方差
    sd = math.sqrt(vr)  # 標(biāo)準(zhǔn)差
    print('[%s] %s across %d steps, %.3f +/- %.3f sec/batch' % (datetime.now(), info_string, num_batches, mn, sd))


# test demo
def run_benchmark():
    """
    main function for test or demo
    :return: 
    """
    with tf.Graph().as_default():
        image_size = 224  # 輸入圖像尺寸
        images = tf.Variable(tf.random_normal([batch_size, image_size, image_size, 3], dtype=tf.float32, stddev=1e-1))

        # method 0
        # prediction, fc = resnet50(images, training=True)
        model = Mobilenet(224, 3)
        prediction = model(images, is_training=True)
        fc = model.fc6

        params = tf.trainable_variables()

        for v in params:
            print(v)
        init = tf.global_variables_initializer()

        print("out shape ", prediction)
        sess = tf.Session()
        print("init...")
        sess.run(init)

        print("predict..")
        writer = tf.summary.FileWriter("./logs")
        writer.add_graph(sess.graph)
        time_tensorflow_run(sess, prediction, {}, "Forward")

        # 用以模擬訓(xùn)練的過程
        objective = tf.nn.l2_loss(fc)  # 給一個(gè)loss
        grad = tf.gradients(objective, params)  # 相對(duì)于loss的 所有模型參數(shù)的梯度

        print('grad backword')
        time_tensorflow_run(sess, grad, {}, "Forward-backward")
        writer.close()


if __name__ == '__main__':
    run_benchmark()

注: 完整代碼可參見個(gè)人github工程

參數(shù)量

與其他網(wǎng)絡(luò)結(jié)構(gòu)對(duì)比

時(shí)間效率

參考

https://blog.csdn.net/wfei101/article/details/78310226
https://blog.csdn.net/u013709270/article/details/78722985
1x1卷積

最后編輯于：2018.11.24 18:46:18

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末须喂，一起剝皮案震驚了整個(gè)濱河市吁断，隨后出現(xiàn)的幾起案子趁蕊，更是在濱河造成了極大的恐慌，老刑警劉巖仔役，帶你破解...
沈念sama閱讀 211,123評(píng)論 6贊 490
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件掷伙，死亡現(xiàn)場離奇詭異，居然都是意外死亡又兵，警方通過查閱死者的電腦和手機(jī)任柜，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 90,031評(píng)論 2贊 384
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來沛厨，“玉大人宙地，你說我怎么就攤上這事《硭福” “怎么了？”我有些...
開封第一講書人閱讀 156,723評(píng)論 0贊 345
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵级野，是天一觀的道長页屠。經(jīng)常有香客問我，道長蓖柔，這世上最難降的妖魔是什么辰企？我笑而不...
開封第一講書人閱讀 56,357評(píng)論 1贊 283
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮况鸣，結(jié)果婚禮上牢贸，老公的妹妹穿的比我還像新娘。我一直安慰自己镐捧，他們只是感情好潜索，可當(dāng)我...
茶點(diǎn)故事閱讀 65,412評(píng)論 5贊 384
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著懂酱，像睡著了一般竹习。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上列牺，一...
開封第一講書人閱讀 49,760評(píng)論 1贊 289
城市分裂傳說
那天整陌，我揣著相機(jī)與錄音，去河邊找鬼瞎领。笑死泌辫，一個(gè)胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的九默。我是一名探鬼主播震放，決...
沈念sama閱讀 38,904評(píng)論 3贊 405
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼驼修！你這毒婦竟也來了澜搅？” 一聲冷哼從身側(cè)響起伍俘，我...
開封第一講書人閱讀 37,672評(píng)論 0贊 266
萬榮殺人案實(shí)錄
序言：老撾萬榮一對(duì)情侶失蹤，失蹤者是張志新（化名）和其女友劉穎勉躺，沒想到半個(gè)月后癌瘾，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 44,118評(píng)論 1贊 303
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡饵溅，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 36,456評(píng)論 2贊 325
?白月光啟示錄
正文我和宋清朗相戀三年妨退，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片蜕企。...
茶點(diǎn)故事閱讀 38,599評(píng)論 1贊 340
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡咬荷，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出轻掩，到底是詐尸還是另有隱情幸乒，我是刑警寧澤，帶...
沈念sama閱讀 34,264評(píng)論 4贊 328
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布唇牧，位于F島的核電站罕扎，受9級(jí)特大地震影響，放射性物質(zhì)發(fā)生泄漏丐重。R本人自食惡果不足惜腔召，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,857評(píng)論 3贊 312
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望扮惦。院中可真熱鬧臀蛛，春花似錦、人聲如沸崖蜜。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,731評(píng)論 0贊 21
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽豫领。三九已至氧卧，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間氏堤，已是汗流浹背沙绝。一陣腳步聲響...
開封第一講書人閱讀 31,956評(píng)論 1贊 264
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留鼠锈，地道東北人闪檬。一個(gè)月前我還...
沈念sama閱讀 46,286評(píng)論 2贊 360
代替公主和親
正文我出身青樓，卻偏偏與公主長得像购笆，于是被迫代替她去往敵國和親粗悯。傳聞我的和親對(duì)象是個(gè)殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 43,465評(píng)論 2贊 348