CNN經(jīng)典網(wǎng)絡(luò)MobileNet-v2(pytorch實(shí)現(xiàn)）

前言：一個(gè)CV小白矛紫，寫文章目的為了讓和我一樣的小白輕松如何桩皿，讓大佬鞏固基礎(chǔ)（手動(dòng)狗頭）霞捡，大家有任何問題可以一起在評(píng)論區(qū)留言討論~

推薦B站UP主劈里啪啦Wz手销，文章中ppt就是用它的圖片，人講的非常好~

在之前的文章中講的AlexNet访诱、VGG垫挨、GoogLeNet以及ResNet網(wǎng)絡(luò)，它們都是傳統(tǒng)卷積神經(jīng)網(wǎng)絡(luò)（都是使用的傳統(tǒng)卷積層）触菜，缺點(diǎn)在于內(nèi)存需求大九榔、運(yùn)算量大導(dǎo)致無法在移動(dòng)設(shè)備以及嵌入式設(shè)備上運(yùn)行。而本文要講的MobileNet網(wǎng)絡(luò)就是專門為移動(dòng)端涡相，嵌入式端而設(shè)計(jì)哲泊。

在這里插入圖片描述

MobileNet v1

MobileNet網(wǎng)絡(luò)是由google團(tuán)隊(duì)在2017年提出的，專注于移動(dòng)端或者嵌入式設(shè)備中的輕量級(jí)CNN網(wǎng)絡(luò)催蝗。相比傳統(tǒng)卷積神經(jīng)網(wǎng)絡(luò)切威，在準(zhǔn)確率小幅降低的前提下大大減少模型參數(shù)與運(yùn)算量。(相比VGG16準(zhǔn)確率減少了0.9%丙号，但模型參數(shù)只有VGG的1/32)先朦。

要說MobileNet網(wǎng)絡(luò)的優(yōu)點(diǎn)缰冤，無疑是其中的Depthwise Convolution結(jié)構(gòu)(大大減少運(yùn)算量和參數(shù)數(shù)量)。下圖展示了傳統(tǒng)卷積與DW卷積的差異喳魏，在傳統(tǒng)卷積中棉浸，每個(gè)卷積核的channel與輸入特征矩陣的channel相等（每個(gè)卷積核都會(huì)與輸入特征矩陣的每一個(gè)維度進(jìn)行卷積運(yùn)算）。

在這里插入圖片描述

而在DW卷積中截酷，每個(gè)卷積核的channel都是等于1的（每個(gè)卷積核只負(fù)責(zé)輸入特征矩陣的一個(gè)channel涮拗，故卷積核的個(gè)數(shù)必須等于輸入特征矩陣的channel數(shù)，從而使得輸出特征矩陣的channel數(shù)也等于輸入特征矩陣的channel數(shù)）

剛剛說了使用DW卷積后輸出特征矩陣的channel是與輸入特征矩陣的channel相等的迂苛，如果想改變/自定義輸出特征矩陣的channel三热，那只需要在DW卷積后接上一個(gè)PW卷積即可.

如下圖所示，其實(shí)PW卷積就是普通的卷積而已（只不過卷積核大小為1）三幻。通常DW卷積和PW卷積是放在一起使用的就漾，一起叫做Depthwise Separable Convolution（深度可分卷積）。

在這里插入圖片描述

那Depthwise Separable Convolution（深度可分卷積）與傳統(tǒng)的卷積相比有到底能節(jié)省多少計(jì)算量呢念搬，下圖對(duì)比了這兩個(gè)卷積方式的計(jì)算量抑堡，其中Df是輸入特征矩陣的寬高（這里假設(shè)寬和高相等），Dk是卷積核的大小朗徊，M是輸入特征矩陣的channel首妖，N是輸出特征矩陣的channel，卷積計(jì)算量近似等于卷積核的高 x 卷積核的寬 x 卷積核的channel x 輸入特征矩陣的高 x 輸入特征矩陣的寬（這里假設(shè)stride等于1）爷恳，在我們mobilenet網(wǎng)絡(luò)中DW卷積都是是使用3x3大小的卷積核有缆。所以理論上普通卷積計(jì)算量是DW+PW卷積的8到9倍（公式來源于原論文）：

在這里插入圖片描述

在了解完Depthwise Separable Convolution（深度可分卷積）后在看下mobilenet v1的網(wǎng)絡(luò)結(jié)構(gòu)，左側(cè)的表格是mobileNetv1的網(wǎng)絡(luò)結(jié)構(gòu)温亲，表中標(biāo)Conv的表示普通卷積棚壁，Conv dw代表剛剛說的DW卷積，s表示步距栈虚，根據(jù)表格信息就能很容易的搭建出mobileNet v1網(wǎng)絡(luò)袖外。

在mobilenetv1原論文中，還提出了兩個(gè)超參數(shù)魂务，一個(gè)是α一個(gè)是β曼验。

寬度因子 為了構(gòu)造這些結(jié)構(gòu)更小且計(jì)算量更小的模型，我們引入了一個(gè)參數(shù)α粘姜，稱為寬度因子鬓照。寬度因子α的作用是在每層均勻地稀疏網(wǎng)絡(luò)，為每層通道乘以一定的比例相艇，從而減少各層的通道數(shù)。常用值有1纯陨、0.75坛芽、0.5留储、0.25。

分辨率因子 為了減少計(jì)算量咙轩，引入了第二個(gè)參數(shù)ρ获讳，稱為分辨率因子。其作用是在每層特征圖的大小乘以一定的比例活喊。

下圖右側(cè)給出了使用不同α和β網(wǎng)絡(luò)的分類準(zhǔn)確率丐膝，計(jì)算量以及模型參數(shù)：

在這里插入圖片描述

<figcaption style="margin-top: 5px; text-align: center; color: #888; font-size: 14px;">在這里插入圖片描述</figcaption>

MobileNet v2

在MobileNet v1的網(wǎng)絡(luò)結(jié)構(gòu)表中能夠發(fā)現(xiàn)，網(wǎng)絡(luò)的結(jié)構(gòu)就像VGG一樣是個(gè)直筒型的钾菊，不像ResNet網(wǎng)絡(luò)有shorcut之類的連接方式帅矗。而且有人反映說MobileNet v1網(wǎng)絡(luò)中的DW卷積很容易訓(xùn)練廢掉，效果并沒有那么理想煞烫。所以我們接著看下MobileNet v2網(wǎng)絡(luò)浑此。

MobileNet v2網(wǎng)絡(luò)是由google團(tuán)隊(duì)在2018年提出的，相比MobileNet V1網(wǎng)絡(luò)滞详，準(zhǔn)確率更高凛俱，模型更小。

MobileNet v2 模型的特點(diǎn)：

在這里插入圖片描述

如上圖料饥，mobileNet v2在V1基礎(chǔ)上進(jìn)行了改進(jìn)蒲犬。

剛剛說了MobileNet v1網(wǎng)絡(luò)中的亮點(diǎn)是DW卷積，那么在MobileNet v2中的亮點(diǎn)就是Inverted residual block（倒殘差結(jié)構(gòu)）岸啡，同時(shí)分析了v1的幾個(gè)缺點(diǎn)并針對(duì)性的做了改進(jìn)原叮。v2的改進(jìn)策略非常簡單，但是在編寫論文時(shí)凰狞，缺點(diǎn)分析的時(shí)候涉及了流行學(xué)習(xí)等內(nèi)容篇裁，將優(yōu)化過程弄得非常難懂。我們在這里簡單總結(jié)一下v2中給出的問題分析赡若，希望能對(duì)論文的閱讀有所幫助达布，對(duì)v2的motivation感興趣的同學(xué)推薦閱讀論文痴腌。

當(dāng)我們單獨(dú)去看Feature Map的每個(gè)通道的像素的值的時(shí)候矗晃，其實(shí)這些值代表的特征可以映射到一個(gè)低維子空間的一個(gè)流形區(qū)域上。在進(jìn)行完卷積操作之后往往會(huì)接一層激活函數(shù)來增加特征的非線性性瓮恭，一個(gè)最常見的激活函數(shù)便是ReLU身腻。根據(jù)我們在殘差網(wǎng)絡(luò)中介紹的數(shù)據(jù)處理不等式(DPI)产还，ReLU一定會(huì)帶來信息損耗，而且這種損耗是沒有辦法恢復(fù)的嘀趟，ReLU的信息損耗是當(dāng)通道數(shù)非常少的時(shí)候更為明顯脐区。為什么這么說呢？我們看圖6中這個(gè)例子她按，其輸入是一個(gè)表示流形數(shù)據(jù)的矩陣牛隅，和卷機(jī)操作類似炕柔，他會(huì)經(jīng)過 n個(gè)ReLU的操作得到 n個(gè)通道的Feature Map，然后我們試圖通過這n個(gè)Feature Map還原輸入數(shù)據(jù)媒佣，還原的越像說明信息損耗的越少匕累。從圖6中我們可以看出，當(dāng) n的值比較小時(shí)默伍，ReLU的信息損耗非常嚴(yán)重欢嘿，但是當(dāng)n 的值比較大的時(shí)候，輸入流形就能還原的很好了也糊。

在這里插入圖片描述

根據(jù)對(duì)上面提到的信息損耗問題分析炼蹦，我們可以有兩種解決方案：

既然是ReLU導(dǎo)致的信息損耗，那么我們就將ReLU替換成線性激活函數(shù)显设；
如果比較多的通道數(shù)能減少信息損耗框弛，那么我們就使用更多的通道。

如下下圖所示捕捂，左側(cè)是ResNet網(wǎng)絡(luò)中的殘差結(jié)構(gòu)瑟枫，右側(cè)就是MobileNet v2中的到殘差結(jié)構(gòu)。

在這里插入圖片描述

在殘差結(jié)構(gòu)中是1x1卷積降維->3x3卷積->1x1卷積升維指攒，在倒殘差結(jié)構(gòu)中正好相反慷妙，是1x1卷積升維->3x3DW卷積->1x1卷積降維。為什么要這樣做允悦，原文的解釋是高維信息通過ReLU激活函數(shù)后丟失的信息更少（注意倒殘差結(jié)構(gòu)中基本使用的都是ReLU6激活函數(shù)膝擂，但是最后一個(gè)1x1的卷積層使用的是線性激活函數(shù)）。

在使用倒殘差結(jié)構(gòu)時(shí)需要注意下隙弛，并不是所有的倒殘差結(jié)構(gòu)都有shortcut連接架馋，只有當(dāng)stride=1且輸入特征矩陣與輸出特征矩陣shape相同時(shí)才有shortcut連接（只有當(dāng)shape相同時(shí)，兩個(gè)矩陣才能做加法運(yùn)算全闷，當(dāng)stride=1時(shí)并不能保證輸入特征矩陣的channel與輸出特征矩陣的channel相同）叉寂。

在這里插入圖片描述

下圖是MobileNet v2網(wǎng)絡(luò)的結(jié)構(gòu)表，其中t代表的是擴(kuò)展因子（倒殘差結(jié)構(gòu)中第一個(gè)1x1卷積的擴(kuò)展因子）总珠，c代表輸出特征矩陣的channel屏鳍，n代表倒殘差結(jié)構(gòu)重復(fù)的次數(shù)，s代表步距（注意：這里的步距只是針對(duì)重復(fù)n次的第一層倒殘差結(jié)構(gòu)局服，后面的都默認(rèn)為1）钓瞭。

在這里插入圖片描述

一些問題

MobileNet V2中的bottleneck為什么先擴(kuò)張通道數(shù)在壓縮通道數(shù)呢？

因?yàn)镸obileNet 網(wǎng)絡(luò)結(jié)構(gòu)的核心就是Depth-wise淫奔，此卷積方式可以減少計(jì)算量和參數(shù)量山涡。而為了引入shortcut結(jié)構(gòu)，若參照Resnet中先壓縮特征圖的方式，將使輸入給Depth-wise的特征圖大小太小鸭丛，接下來可提取的特征信息少霍殴，所以在MobileNet V2中采用先擴(kuò)張后壓縮的策略。

MobileNet V2中的bottleneck為什么在1*1卷積之后使用Linear激活函數(shù)系吩？

因?yàn)樵诩せ詈瘮?shù)之前，已經(jīng)使用1*1卷積對(duì)特征圖進(jìn)行了壓縮妒蔚，而ReLu激活函數(shù)對(duì)于負(fù)的輸入值穿挨，輸出為0，會(huì)進(jìn)一步造成信息的損失肴盏，所以使用Linear激活函數(shù)科盛。

3. 總結(jié)

在這篇文章中，我們介紹了兩個(gè)版本的MobileNet菜皂，它們和傳統(tǒng)卷積的對(duì)比如下贞绵。

在這里插入圖片描述

如圖(b)所示，MobileNet v1最主要的貢獻(xiàn)是使用了Depthwise Separable Convolution恍飘，它又可以拆分成Depthwise卷積和Pointwise卷積榨崩。MobileNet v2主要是將殘差網(wǎng)絡(luò)和Depthwise Separable卷積進(jìn)行了結(jié)合。通過分析單通道的流形特征對(duì)殘差塊進(jìn)行了改進(jìn)章母，包括對(duì)中間層的擴(kuò)展(d)以及bottleneck層的線性激活(c)母蛛。Depthwise Separable Convolution的分離式設(shè)計(jì)直接將模型壓縮了8倍左右，但是精度并沒有損失非常嚴(yán)重乳怎，這一點(diǎn)還是非常震撼的彩郊。

Depthwise Separable卷積的設(shè)計(jì)非常精彩但遺憾的是目前cudnn對(duì)其的支持并不好，導(dǎo)致在使用GPU訓(xùn)練網(wǎng)絡(luò)過程中我們無法從算法中獲益蚪缀，但是使用串行CPU并沒有這個(gè)問題秫逝，這也就給了MobileNet很大的市場空間，尤其是在嵌入式平臺(tái)询枚。

最后违帆，不得不承認(rèn)v2的論文的一系列證明非常精彩，雖然沒有這些證明我們也能明白v2的工作原理哩盲，但是這些證明過程還是非常值得仔細(xì)品鑒的前方，尤其是對(duì)于從事科研方向的工作人員。

代碼

注：

本次訓(xùn)練集下載在AlexNet博客有詳細(xì)解說:https://blog.csdn.net/weixin_44023658/article/details/105798326
使用遷移學(xué)習(xí)方法實(shí)現(xiàn)收錄在我的這篇blog中：遷移學(xué)習(xí) TransferLearning—通俗易懂地介紹（pytorch實(shí)例）

#model.pyfrom torch import nnimport torchdef _make_divisible(ch, divisor=8, min_ch=None):    """    This function is taken from the original tf repo.    It ensures that all layers have a channel number that is divisible by 8    It can be seen here:    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py    """    if min_ch is None:        min_ch = divisor    new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)    # Make sure that round down does not go down by more than 10%.    if new_ch < 0.9 * ch:        new_ch += divisor    return new_chclass ConvBNReLU(nn.Sequential):    def __init__(self, in_channel, out_channel, kernel_size=3, stride=1, groups=1):#groups=1普通卷積        padding = (kernel_size - 1) // 2        super(ConvBNReLU, self).__init__(            nn.Conv2d(in_channel, out_channel, kernel_size, stride, padding, groups=groups, bias=False),            nn.BatchNorm2d(out_channel),            nn.ReLU6(inplace=True)        )#到殘差結(jié)構(gòu)class InvertedResidual(nn.Module):    def __init__(self, in_channel, out_channel, stride, expand_ratio):#expand_ratio擴(kuò)展因子        super(InvertedResidual, self).__init__()        hidden_channel = in_channel * expand_ratio        self.use_shortcut = stride == 1 and in_channel == out_channel        layers = []        if expand_ratio != 1:            # 1x1 pointwise conv            layers.append(ConvBNReLU(in_channel, hidden_channel, kernel_size=1))        layers.extend([            # 3x3 depthwise conv            ConvBNReLU(hidden_channel, hidden_channel, stride=stride, groups=hidden_channel),            # 1x1 pointwise conv(linear)            nn.Conv2d(hidden_channel, out_channel, kernel_size=1, bias=False),            nn.BatchNorm2d(out_channel),        ])        self.conv = nn.Sequential(*layers)    def forward(self, x):        if self.use_shortcut:            return x + self.conv(x)        else:            return self.conv(x)class MobileNetV2(nn.Module):    def __init__(self, num_classes=1000, alpha=1.0, round_nearest=8):#alpha超參數(shù)        super(MobileNetV2, self).__init__()        block = InvertedResidual        input_channel = _make_divisible(32 * alpha, round_nearest)        last_channel = _make_divisible(1280 * alpha, round_nearest)        inverted_residual_setting = [            # t, c, n, s            [1, 16, 1, 1],            [6, 24, 2, 2],            [6, 32, 3, 2],            [6, 64, 4, 2],            [6, 96, 3, 1],            [6, 160, 3, 2],            [6, 320, 1, 1],        ]        features = []        # conv1 layer        features.append(ConvBNReLU(3, input_channel, stride=2))        # building inverted residual residual blockes        for t, c, n, s in inverted_residual_setting:            output_channel = _make_divisible(c * alpha, round_nearest)            for i in range(n):                stride = s if i == 0 else 1                features.append(block(input_channel, output_channel, stride, expand_ratio=t))                input_channel = output_channel        # building last several layers        features.append(ConvBNReLU(input_channel, last_channel, 1))        # combine feature layers        self.features = nn.Sequential(*features)        # building classifier        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))        self.classifier = nn.Sequential(            nn.Dropout(0.2),            nn.Linear(last_channel, num_classes)        )        # weight initialization        for m in self.modules():            if isinstance(m, nn.Conv2d):                nn.init.kaiming_normal_(m.weight, mode='fan_out')                if m.bias is not None:                    nn.init.zeros_(m.bias)            elif isinstance(m, nn.BatchNorm2d):                nn.init.ones_(m.weight)                nn.init.zeros_(m.bias)            elif isinstance(m, nn.Linear):                nn.init.normal_(m.weight, 0, 0.01)                nn.init.zeros_(m.bias)    def forward(self, x):        x = self.features(x)        x = self.avgpool(x)        x = torch.flatten(x, 1)        x = self.classifier(x)        return x

#train.pyimport torchimport torch.nn as nnfrom torchvision import transforms, datasetsimport jsonimport osimport torch.optim as optimfrom model import MobileNetV2import torchvision.models.mobilenetdevice = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")print(device)data_transform = {    "train": transforms.Compose([transforms.RandomResizedCrop(224),                                 transforms.RandomHorizontalFlip(),                                 transforms.ToTensor(),                                 transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),    "val": transforms.Compose([transforms.Resize(256),                               transforms.CenterCrop(224),                               transforms.ToTensor(),                               transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])}data_root = os.path.abspath(os.path.join(os.getcwd(), "../../.."))  # get data root pathimage_path = data_root + "/data_set/flower_data/"  # flower data set pathtrain_dataset = datasets.ImageFolder(root=image_path+"train",                                     transform=data_transform["train"])train_num = len(train_dataset)# {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4}flower_list = train_dataset.class_to_idxcla_dict = dict((val, key) for key, val in flower_list.items())# write dict into json filejson_str = json.dumps(cla_dict, indent=4)with open('class_indices.json', 'w') as json_file:    json_file.write(json_str)batch_size = 16train_loader = torch.utils.data.DataLoader(train_dataset,                                           batch_size=batch_size, shuffle=True,                                           num_workers=0)validate_dataset = datasets.ImageFolder(root=image_path + "val",                                        transform=data_transform["val"])val_num = len(validate_dataset)validate_loader = torch.utils.data.DataLoader(validate_dataset,                                              batch_size=batch_size, shuffle=False,                                              num_workers=0)net = MobileNetV2(num_classes=5)# load pretrain weightsmodel_weight_path = "./mobilenet_v2.pth"pre_weights = torch.load(model_weight_path)# delete classifier weightspre_dict = {k: v for k, v in pre_weights.items() if "classifier" not in k}missing_keys, unexpected_keys = net.load_state_dict(pre_dict, strict=False)# freeze features weightsfor param in net.features.parameters():    param.requires_grad = Falsenet.to(device)loss_function = nn.CrossEntropyLoss()optimizer = optim.Adam(net.parameters(), lr=0.0001)best_acc = 0.0save_path = './MobileNetV2.pth'for epoch in range(5):    # train    net.train()    running_loss = 0.0    for step, data in enumerate(train_loader, start=0):        images, labels = data        optimizer.zero_grad()        logits = net(images.to(device))        loss = loss_function(logits, labels.to(device))        loss.backward()        optimizer.step()        # print statistics        running_loss += loss.item()        # print train process        rate = (step+1)/len(train_loader)        a = "*" * int(rate * 50)        b = "." * int((1 - rate) * 50)        print("\rtrain loss: {:^3.0f}%[{}->{}]{:.4f}".format(int(rate*100), a, b, loss), end="")    print()    # validate    net.eval()    acc = 0.0  # accumulate accurate number / epoch    with torch.no_grad():        for val_data in validate_loader:            val_images, val_labels = val_data            outputs = net(val_images.to(device))  # eval model only have last output layer            # loss = loss_function(outputs, test_labels)            predict_y = torch.max(outputs, dim=1)[1]            acc += (predict_y == val_labels.to(device)).sum().item()        val_accurate = acc / val_num        if val_accurate > best_acc:            best_acc = val_accurate            torch.save(net.state_dict(), save_path)        print('[epoch %d] train_loss: %.3f  test_accuracy: %.3f' %              (epoch + 1, running_loss / step, val_accurate))print('Finished Training')

在這里插入圖片描述

<figcaption style="margin-top: 5px; text-align: center; color: #888; font-size: 14px;">在這里插入圖片描述</figcaption>

#pridict.pyimport torchfrom model import MobileNetV2from PIL import Imagefrom torchvision import transformsimport matplotlib.pyplot as pltimport jsondata_transform = transforms.Compose(    [transforms.Resize(256),     transforms.CenterCrop(224),     transforms.ToTensor(),     transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])# load imageimg = Image.open("sunflower.jpg")plt.imshow(img)# [N, C, H, W]img = data_transform(img)# expand batch dimensionimg = torch.unsqueeze(img, dim=0)# read class_indicttry:    json_file = open('./class_indices.json', 'r')    class_indict = json.load(json_file)except Exception as e:    print(e)    exit(-1)# create modelmodel = MobileNetV2(num_classes=5)# load model weightsmodel_weight_path = "./MobileNetV2.pth"model.load_state_dict(torch.load(model_weight_path))model.eval()with torch.no_grad():    # predict class    output = torch.squeeze(model(img))    predict = torch.softmax(output, dim=0)    predict_cla = torch.argmax(predict).numpy()print(class_indict[str(predict_cla)], predict[predict_cla].numpy())plt.show()