一峭沦、概述
論文:Densely Connected Convolutional Networks
論文鏈接:https://arxiv.org/pdf/1608.06993.pdf
代碼的github鏈接:https://github.com/liuzhuang13/DenseNet
作為CVPR2017年的Best Paper, DenseNet脫離了加深網(wǎng)絡(luò)層數(shù)(ResNet)和加寬網(wǎng)絡(luò)結(jié)構(gòu)(Inception)來(lái)提升網(wǎng)絡(luò)性能的定式思維,從特征的角度考慮,通過(guò)特征重用和旁路(Bypass)設(shè)置,既大幅度減少了網(wǎng)絡(luò)的參數(shù)量,又在一定程度上緩解了gradient vanishing問(wèn)題的產(chǎn)生.結(jié)合信息流和特征復(fù)用的假設(shè),DenseNet當(dāng)之無(wú)愧成為2017年計(jì)算機(jī)視覺(jué)頂會(huì)的年度最佳論文.
卷積神經(jīng)網(wǎng)絡(luò)在沉睡了近20年后,如今成為了深度學(xué)習(xí)方向最主要的網(wǎng)絡(luò)結(jié)構(gòu)之一.從一開(kāi)始的只有五層結(jié)構(gòu)的LeNet, 到后來(lái)?yè)碛?9層結(jié)構(gòu)的VGG, 再到首次跨越100層網(wǎng)絡(luò)的Highway Networks與ResNet, 網(wǎng)絡(luò)層數(shù)的加深成為CNN發(fā)展的主要方向之一.
隨著CNN網(wǎng)絡(luò)層數(shù)的不斷增加,gradient vanishing和model degradation問(wèn)題出現(xiàn)在了人們面前,BatchNormalization的廣泛使用在一定程度上緩解了gradient vanishing的問(wèn)題,而ResNet和Highway Networks通過(guò)構(gòu)造恒等映射設(shè)置旁路,進(jìn)一步減少了gradient vanishing和model degradation的產(chǎn)生.Fractal Nets通過(guò)將不同深度的網(wǎng)絡(luò)并行化,在獲得了深度的同時(shí)保證了梯度的傳播,隨機(jī)深度網(wǎng)絡(luò)通過(guò)對(duì)網(wǎng)絡(luò)中一些層進(jìn)行失活,既證明了ResNet深度的冗余性,又緩解了上述問(wèn)題的產(chǎn)生. 雖然這些不同的網(wǎng)絡(luò)框架通過(guò)不同的實(shí)現(xiàn)加深的網(wǎng)絡(luò)層數(shù),但是他們都包含了相同的核心思想,既將feature map進(jìn)行跨網(wǎng)絡(luò)層的連接.
DenseNet作為另一種擁有較深層數(shù)的卷積神經(jīng)網(wǎng)絡(luò),具有如下優(yōu)點(diǎn):
- (1) 相比ResNet擁有更少的參數(shù)數(shù)量.
- (2) 旁路加強(qiáng)了特征的重用.
- (3) 網(wǎng)絡(luò)更易于訓(xùn)練,并具有一定的正則效果.
- (4) 緩解了gradient vanishing和model degradation的問(wèn)題.
何愷明先生在提出ResNet時(shí)做出了這樣的假設(shè):若某一較深的網(wǎng)絡(luò)多出另一較淺網(wǎng)絡(luò)的若干層有能力學(xué)習(xí)到恒等映射,那么這一較深網(wǎng)絡(luò)訓(xùn)練得到的模型性能一定不會(huì)弱于該淺層網(wǎng)絡(luò).
通俗的說(shuō)就是如果對(duì)某一網(wǎng)絡(luò)中增添一些可以學(xué)到恒等映射的層組成新的網(wǎng)路,那么最差的結(jié)果也是新網(wǎng)絡(luò)中的這些層在訓(xùn)練后成為恒等映射而不會(huì)影響原網(wǎng)絡(luò)的性能.同樣DenseNet在提出時(shí)也做過(guò)假設(shè):與其多次學(xué)習(xí)冗余的特征,特征復(fù)用是一種更好的特征提取方式.
二、DenseNet
在深度學(xué)習(xí)網(wǎng)絡(luò)中间聊,隨著網(wǎng)絡(luò)深度的加深族购,梯度消失問(wèn)題會(huì)愈加明顯壳贪,目前很多論文都針對(duì)這個(gè)問(wèn)題提出了解決方案,比如ResNet寝杖,Highway Networks违施,Stochastic depth,F(xiàn)ractalNets等瑟幕,盡管這些算法的網(wǎng)絡(luò)結(jié)構(gòu)有差別磕蒲,但是核心都在于:create short paths from early layers to later layers。那么作者是怎么做呢只盹?延續(xù)這個(gè)思路辣往,那就是在保證網(wǎng)絡(luò)中層與層之間最大程度的信息傳輸?shù)那疤嵯拢苯訉⑺袑舆B接起來(lái)殖卑!
先放一個(gè)dense block的結(jié)構(gòu)圖站削。在傳統(tǒng)的卷積神經(jīng)網(wǎng)絡(luò)中,如果你有L層孵稽,那么就會(huì)有L個(gè)連接许起,但是在DenseNet中十偶,會(huì)有L(L+1)/2個(gè)連接。簡(jiǎn)單講园细,就是每一層的輸入來(lái)自前面所有層的輸出扯键。如下圖:x0是input,H1的輸入是x0(input)珊肃,H2的輸入是x0和x1(x1是H1的輸出)……
DenseNet的一個(gè)優(yōu)點(diǎn)是網(wǎng)絡(luò)更窄,參數(shù)更少馅笙,很大一部分原因得益于這種dense block的設(shè)計(jì)伦乔,后面有提到在dense block中每個(gè)卷積層的輸出feature map的數(shù)量都很小(小于100)董习,而不是像其他網(wǎng)絡(luò)一樣動(dòng)不動(dòng)就幾百上千的寬度烈和。同時(shí)這種連接方式使得特征和梯度的傳遞更加有效,網(wǎng)絡(luò)也就更加容易訓(xùn)練皿淋。
原文的一句話非常喜歡:Each layer has direct access to the gradients from the loss function and the original input signal, leading to an implicit deep supervision.直接解釋了為什么這個(gè)網(wǎng)絡(luò)的效果會(huì)很好招刹。前面提到過(guò)梯度消失問(wèn)題在網(wǎng)絡(luò)深度越深的時(shí)候越容易出現(xiàn),原因就是輸入信息和梯度信息在很多層之間傳遞導(dǎo)致的窝趣,而現(xiàn)在這種dense connection相當(dāng)于每一層都直接連接input和loss疯暑,因此就可以減輕梯度消失現(xiàn)象,這樣更深網(wǎng)絡(luò)不是問(wèn)題哑舒。另外作者還觀察到這種dense connection有正則化的效果妇拯,因此對(duì)于過(guò)擬合有一定的抑制作用,博主認(rèn)為是因?yàn)閰?shù)減少了(后面會(huì)介紹為什么參數(shù)會(huì)減少)洗鸵,所以過(guò)擬合現(xiàn)象減輕越锈。
這篇文章的一個(gè)優(yōu)點(diǎn)就是基本上沒(méi)有公式,不像灌水文章一樣堆復(fù)雜公式把人看得一愣一愣的膘滨。文章中只有兩個(gè)公式甘凭,是用來(lái)闡述DenseNet和ResNet的關(guān)系,對(duì)于從原理上理解這兩個(gè)網(wǎng)絡(luò)還是非常重要的火邓。
-
第一個(gè)公式是ResNet的丹弱。這里的l表示層,xl表示l層的輸出贡翘,Hl表示一個(gè)非線性變換蹈矮。所以對(duì)于ResNet而言,l層的輸出是l-1層的輸出加上對(duì)l-1層輸出的非線性變換鸣驱。
在這里插入圖片描述 -
第二個(gè)公式是DenseNet的泛鸟。[x0,x1,…,xl-1]表示將0到l-1層的輸出feature map做concatenation。concatenation是做通道的合并踊东,就像Inception那樣北滥。而前面resnet是做值的相加刚操,通道數(shù)是不變的。Hl包括BN再芋,ReLU和3*3的卷積菊霜。
在這里插入圖片描述
所以從這兩個(gè)公式就能看出DenseNet和ResNet在本質(zhì)上的區(qū)別,太精辟了济赎。
接著說(shuō)下論文中一直提到的Identity function:
很簡(jiǎn)單 就是輸出等于輸入
傳統(tǒng)的前饋網(wǎng)絡(luò)結(jié)構(gòu)可以看成處理網(wǎng)絡(luò)狀態(tài)(特征圖鉴逞?)的算法,狀態(tài)從層之間傳遞司训,每個(gè)層從之前層讀入狀態(tài)构捡,然后寫(xiě)入之后層,可能會(huì)改變狀態(tài)壳猜,也會(huì)保持傳遞不變的信息勾徽。ResNet是通過(guò)Identity transformations來(lái)明確傳遞這種不變信息。
再看下面统扳,前面的Figure 1表示的是dense block喘帚,而下面的Figure 2表示的則是一個(gè)DenseNet的結(jié)構(gòu)圖,在這個(gè)結(jié)構(gòu)圖中包含了3個(gè)dense block咒钟。作者將DenseNet分成多個(gè)dense block吹由,原因是希望各個(gè)dense block內(nèi)的feature map的size統(tǒng)一,這樣在做concatenation就不會(huì)有size的問(wèn)題盯腌。
這個(gè)Table1(下圖)就是整個(gè)網(wǎng)絡(luò)的結(jié)構(gòu)圖溉知。
這個(gè)表中的k=32,k=48中的k是growth rate腕够,表示每個(gè)dense block中每層輸出的feature map個(gè)數(shù)级乍。為了避免網(wǎng)絡(luò)變得很寬,作者都是采用較小的k帚湘,比如32這樣玫荣,
作者的實(shí)驗(yàn)也表明小的k可以有更好的效果。根據(jù)dense block的設(shè)計(jì)大诸,后面幾層可以得到前面所有層的輸入捅厂,因此concat后的輸入channel還是比較大的。另外這里每個(gè)dense block的33卷積前面都包含了一個(gè)11的卷積操作资柔,就是所謂的bottleneck layer焙贷,目的是減少輸入的feature map數(shù)量,既能降維減少計(jì)算量贿堰,又能融合各個(gè)通道的特征辙芍,何樂(lè)而不為。
另外作者為了進(jìn)一步壓縮參數(shù),在每?jī)蓚€(gè)dense block之間又增加了11的卷積操作故硅。因此在后面的實(shí)驗(yàn)對(duì)比中庶灿,如果你看到DenseNet-C這個(gè)網(wǎng)絡(luò),表示增加了這個(gè)Translation layer吃衅,該層的11卷積的輸出channel默認(rèn)是輸入channel到一半往踢。如果你看到DenseNet-BC這個(gè)網(wǎng)絡(luò),表示既有bottleneck layer徘层,又有Translation layer峻呕。
再詳細(xì)說(shuō)下bottleneck和transition layer操作
在每個(gè)Dense Block中都包含很多個(gè)子結(jié)構(gòu),以DenseNet-169的Dense Block(3)為例趣效,包含32個(gè)11和33的卷積操作山上,也就是第32個(gè)子結(jié)構(gòu)的輸入是前面31層的輸出結(jié)果,每層輸出的channel是32(growth rate)英支,那么如果不做bottleneck操作,第32層的33卷積操作的輸入就是3132+(上一個(gè)Dense Block的輸出channel)哮伟,近1000了干花。而加上11的卷積,代碼中的11卷積的channel是growth rate4楞黄,也就是128池凄,然后再作為33卷積的輸入。這就大大減少了計(jì)算量鬼廓,這就是bottleneck肿仑。
至于transition layer,放在兩個(gè)Dense Block中間碎税,是因?yàn)槊總€(gè)Dense Block結(jié)束后的輸出channel個(gè)數(shù)很多尤慰,需要用11的卷積核來(lái)降維。還是以DenseNet-169的Dense Block(3)為例雷蹂,雖然第32層的33卷積輸出channel只有32個(gè)(growth rate)伟端,但是緊接著還會(huì)像前面幾層一樣有通道的concat操作,即將第32層的輸出和第32層的輸入做concat匪煌,前面說(shuō)過(guò)第32層的輸入是1000左右的channel责蝠,所以最后每個(gè)Dense Block的輸出也是1000多的channel。因此這個(gè)transition layer有個(gè)參數(shù)reduction(范圍是0到1)萎庭,表示將這些輸出縮小到原來(lái)的多少倍霜医,默認(rèn)是0.5,這樣傳給下一個(gè)Dense Block的時(shí)候channel數(shù)量就會(huì)減少一半驳规,這就是transition layer的作用肴敛。文中還用到dropout操作來(lái)隨機(jī)減少分支,避免過(guò)擬合达舒,畢竟這篇文章的連接確實(shí)多值朋。
實(shí)驗(yàn)結(jié)果:
作者在不同數(shù)據(jù)集上采用的DenseNet網(wǎng)絡(luò)會(huì)有一點(diǎn)不一樣叹侄,比如在Imagenet數(shù)據(jù)集上,DenseNet-BC有4個(gè)dense block昨登,但是在別的數(shù)據(jù)集上只用3個(gè)dense block趾代。其他更多細(xì)節(jié)可以看論文3部分的Implementation Details。訓(xùn)練的細(xì)節(jié)和超參數(shù)的設(shè)置可以看論文4.2部分丰辣,在ImageNet數(shù)據(jù)集上測(cè)試的時(shí)候有做224*224的center crop撒强。
Table2是在三個(gè)數(shù)據(jù)集(C10,C100笙什,SVHN)上和其他算法的對(duì)比結(jié)果飘哨。ResNet[11]就是kaiming He的論文,對(duì)比結(jié)果一目了然琐凭。DenseNet-BC的網(wǎng)絡(luò)參數(shù)和相同深度的DenseNet相比確實(shí)減少了很多芽隆!參數(shù)減少除了可以節(jié)省內(nèi)存,還能減少過(guò)擬合這里對(duì)于SVHN數(shù)據(jù)集统屈,DenseNet-BC的結(jié)果并沒(méi)有DenseNet(k=24)的效果好胚吁,作者認(rèn)為原因主要是SVHN這個(gè)數(shù)據(jù)集相對(duì)簡(jiǎn)單,更深的模型容易過(guò)擬合愁憔。在表格的倒數(shù)第二個(gè)區(qū)域的三個(gè)不同深度L和k的DenseNet的對(duì)比可以看出隨著L和k的增加腕扶,模型的效果是更好的。
Figure3是DenseNet-BC和ResNet在Imagenet數(shù)據(jù)集上的對(duì)比吨掌,左邊那個(gè)圖是參數(shù)復(fù)雜度和錯(cuò)誤率的對(duì)比半抱,你可以在相同錯(cuò)誤率下看參數(shù)復(fù)雜度,也可以在相同參數(shù)復(fù)雜度下看錯(cuò)誤率膜宋,提升還是很明顯的窿侈!右邊是flops(可以理解為計(jì)算復(fù)雜度)和錯(cuò)誤率的對(duì)比,同樣有效果秋茫。
Figure4也很重要棉磨。左邊的圖表示不同類型DenseNet的參數(shù)和error對(duì)比。中間的圖表示DenseNet-BC和ResNet在參數(shù)和error的對(duì)比学辱,相同error下乘瓤,DenseNet-BC的參數(shù)復(fù)雜度要小很多。右邊的圖也是表達(dá)DenseNet-BC-100只需要很少的參數(shù)就能達(dá)到和ResNet-1001相同的結(jié)果策泣。
在設(shè)計(jì)初,DenseNet便被設(shè)計(jì)成讓一層網(wǎng)絡(luò)可以使用所有之前層網(wǎng)絡(luò)feature map的網(wǎng)絡(luò)結(jié)構(gòu),為了探索feature的復(fù)用情況,作者進(jìn)行了相關(guān)實(shí)驗(yàn).作者訓(xùn)練的L=40,K=12的DenseNet,對(duì)于任意Denseblock中的所有卷積層,計(jì)算之前某層feature map在該層權(quán)重的絕對(duì)值平均數(shù).這一平均數(shù)表明了這一層對(duì)于之前某一層feature的利用率,下圖為由該平均數(shù)繪制出的熱力圖:
從圖中我們可以得出以下結(jié)論:
- a一些較早層提取出的特征仍可能被較深層直接使用
- b 即使是Transition layer也會(huì)使用到之前Denseblock中所有層的特征
- c 第2-3個(gè)Denseblock中的層對(duì)之前Transition layer利用率很低,說(shuō)明transition layer輸出大量冗余特征.這也為DenseNet-BC提供了證據(jù)支持,既Compression的必要性.
- d 最后的分類層雖然使用了之前Denseblock中的多層信息,但更偏向于使用最后幾個(gè)feature map的特征,說(shuō)明在網(wǎng)絡(luò)的最后幾層,某些high-level的特征可能被產(chǎn)生.
另外提一下DenseNet和stochastic depth的關(guān)系衙傀,在stochastic depth中,residual中的layers在訓(xùn)練過(guò)程中會(huì)被隨機(jī)drop掉萨咕,其實(shí)這就會(huì)使得相鄰層之間直接連接统抬,這和DenseNet是很像的。
總結(jié):
該文章提出的DenseNet核心思想在于建立了不同層之間的連接關(guān)系,充分利用了feature聪建,進(jìn)一步減輕了梯度消失問(wèn)題钙畔,加深網(wǎng)絡(luò)不是問(wèn)題,而且訓(xùn)練效果非常好金麸。另外擎析,利用bottleneck layer,Translation layer以及較小的growth rate使得網(wǎng)絡(luò)變窄挥下,參數(shù)減少揍魂,有效抑制了過(guò)擬合,同時(shí)計(jì)算量也減少了棚瘟。DenseNet優(yōu)點(diǎn)很多现斋,而且在和ResNet的對(duì)比中優(yōu)勢(shì)還是非常明顯的。
pytorch實(shí)現(xiàn)(參考自官網(wǎng)densenet121)
model.py
整體為
1.輸入:圖片
2.經(jīng)過(guò)feature block(圖中的第一個(gè)convolution層偎蘸,后面可以加一個(gè)pooling層,這里沒(méi)有畫(huà)出來(lái))
3.經(jīng)過(guò)第一個(gè)dense block庄蹋, 該Block中有n個(gè)dense layer,灰色圓圈表示,每個(gè)dense layer都是dense connection,即每一層的輸入都是前面所有層的輸出的拼接
4.經(jīng)過(guò)第一個(gè)transition block,由convolution和poolling組成
5.經(jīng)過(guò)第二個(gè)dense block
6.經(jīng)過(guò)第二個(gè)transition block
7.經(jīng)過(guò)第三個(gè)dense block
8.經(jīng)過(guò)classification block,由pooling,linear層組成迷雪,輸出softmax的score
9.經(jīng)過(guò)prediction層蔓肯,softmax分類
10.輸出:分類概率
Dense Layer
最開(kāi)始輸出(56 * 56 * 64)或者是上一層dense layer的輸出
1.Batch Normalization, 輸出(56 * 56 * 64)
2.ReLU 振乏,輸出(56 * 56 * 64)
3.
-1x1 Convolution, kernel_size=1, channel = bn_size *growth_rate, 則輸出為(56 * 56 * 128)
-Batch Normalization(56 * 56 * 128)
-ReLU(56 * 56 * 128)
4.Convolution, kernel_size=3, channel = growth_rate (56 * 56 * 32)
5.Dropout,可選的,用于防止過(guò)擬合(56 * 56 * 32)
class _DenseLayer(nn.Sequential):
def __init__(self, num_input_features, growth_rate, bn_size, drop_rate, memory_efficient=False):#num_input_features特征層數(shù)
super(_DenseLayer, self).__init__()#growth_rate=32增長(zhǎng)率 bn_size=4
#(56 * 56 * 64)
self.add_module('norm1', nn.BatchNorm2d(num_input_features)),
self.add_module('relu1', nn.ReLU(inplace=True)),
self.add_module('conv1', nn.Conv2d(num_input_features, bn_size *
growth_rate, kernel_size=1, stride=1,
bias=False)),
self.add_module('norm2', nn.BatchNorm2d(bn_size * growth_rate)),
self.add_module('relu2', nn.ReLU(inplace=True)),
self.add_module('conv2', nn.Conv2d(bn_size * growth_rate, growth_rate,
kernel_size=3, stride=1, padding=1,
bias=False)),
#(56 * 56 * 32)
self.drop_rate = drop_rate
self.memory_efficient = memory_efficient
def forward(self, *prev_features):
bn_function = _bn_function_factory(self.norm1, self.relu1, self.conv1)#(56 * 56 * 64*3)
if self.memory_efficient and any(prev_feature.requires_grad for prev_feature in prev_features):
bottleneck_output = cp.checkpoint(bn_function, *prev_features)
else:
bottleneck_output = bn_function(*prev_features)
# bn1 + relu1 + conv1
new_features = self.conv2(self.relu2(self.norm2(bottleneck_output)))
if self.drop_rate > 0:
new_features = F.dropout(new_features, p=self.drop_rate,
training=self.training)
return new_features
def _bn_function_factory(norm, relu, conv):
def bn_function(*inputs):
# type(List[Tensor]) -> Tensor
concated_features = torch.cat(inputs, 1)#按通道合并
# bn1 + relu1 + conv1
bottleneck_output = conv(relu(norm(concated_features)))
return bottleneck_output
return bn_function
Dense Block
Dense Block有L層dense layer組成
layer 0:輸入(56 * 56 * 64)->輸出(56 * 56 * 32)
layer 1:輸入(56 * 56 (32 * 1))->輸出(56 * 56 * 32)
layer 2:輸入(56 * 56 (32 * 2))->輸出(56 * 56 * 32)
…
layer L:輸入(56 * 56 * (32 * L))->輸出(56 * 56 * 32)
注意,L層dense layer的輸出都是不變的秉扑,而每層的輸入channel數(shù)是增加的慧邮,因?yàn)槿缟纤觯繉拥妮斎胧乔懊嫠袑拥钠唇印?
class _DenseBlock(nn.Module):
def __init__(self, num_layers, num_input_features, bn_size, growth_rate, drop_rate, memory_efficient=False):
super(_DenseBlock, self).__init__()#num_layers層重復(fù)次數(shù)
for i in range(num_layers):
layer = _DenseLayer(
num_input_features + i * growth_rate, #for一次層數(shù)增加32
growth_rate=growth_rate,
bn_size=bn_size,
drop_rate=drop_rate,
memory_efficient=memory_efficient,
)
self.add_module('denselayer%d' % (i + 1), layer) # 追加denselayer層到字典里面
def forward(self, init_features):
features = [init_features] #原來(lái)的特征舟陆,64
for name, layer in self.named_children(): # 依次遍歷添加的6個(gè)layer層误澳,
new_features = layer(*features) #計(jì)算特征
features.append(new_features) # 追加特征
return torch.cat(features, 1) # 按通道數(shù)合并特征 64 + 6*32=256
Transition Block
Transition Block是在兩個(gè)Dense Block之間的,由一個(gè)卷積+一個(gè)pooling組成(下面的數(shù)據(jù)維度以第一個(gè)transition block為例):
輸入:Dense Block的輸出(56 * 56 * 32)
1.Batch Normalization 輸出(56 * 56 * 32)
2.ReLU 輸出(56 * 56 * 32)
3.1x1 Convolution秦躯,kernel_size=1忆谓,此處可以根據(jù)預(yù)先設(shè)定的壓縮系數(shù)(0-1之間)來(lái)壓縮原來(lái)的channel數(shù),以減小參數(shù),輸出(56 * 56 *(32 * compression))
4.2x2 Average Pooling 輸出(28 * 28 * (32 * compression))
class DenseNet(nn.Module):
r"""Densenet-BC model class, based on
`"Densely Connected Convolutional Networks" <https://arxiv.org/pdf/1608.06993.pdf>`_
Args:
growth_rate (int) - how many filters to add each layer (`k` in paper)
block_config (list of 4 ints) - how many layers in each pooling block
num_init_features (int) - the number of filters to learn in the first convolution layer
bn_size (int) - multiplicative factor for number of bottle neck layers
(i.e. bn_size * k features in the bottleneck layer)
drop_rate (float) - dropout rate after each dense layer
num_classes (int) - number of classification classes
memory_efficient (bool) - If True, uses checkpointing. Much more memory efficient,
but slower. Default: *False*. See `"paper" <https://arxiv.org/pdf/1707.06990.pdf>`_
"""
def __init__(self, growth_rate=32, block_config=(6, 12, 24, 16),
num_init_features=64, bn_size=4, drop_rate=0, num_classes=1000, memory_efficient=False):
super(DenseNet, self).__init__()
# First convolution
self.features = nn.Sequential(OrderedDict([
('conv0', nn.Conv2d(3, num_init_features, kernel_size=7, stride=2,
padding=3, bias=False)),
('norm0', nn.BatchNorm2d(num_init_features)),
('relu0', nn.ReLU(inplace=True)),
('pool0', nn.MaxPool2d(kernel_size=3, stride=2, padding=1)),
]))
# Each denseblock
num_features = num_init_features
for i, num_layers in enumerate(block_config):
block = _DenseBlock(
num_layers=num_layers, # 層數(shù)重復(fù)次數(shù)
num_input_features=num_features, # 特征層數(shù)64
bn_size=bn_size,
growth_rate=growth_rate,
drop_rate=drop_rate, # dropout值 0
memory_efficient=memory_efficient
)
self.features.add_module('denseblock%d' % (i + 1), block) # 追加denseblock
num_features = num_features + num_layers * growth_rate # 更新num_features=64+6*32 = 256
if i != len(block_config) - 1:#每?jī)蓚€(gè)dense block之間增加一個(gè)過(guò)渡層踱承,i != (4-1)倡缠,即 i != 3 非最后一個(gè)denseblock,后面跟_Transition層
trans = _Transition(num_input_features=num_features,
num_output_features=num_features // 2) # 輸出通道數(shù)減半
self.features.add_module('transition%d' % (i + 1), trans)
num_features = num_features // 2 # 更新num_features= num_features//2 取整數(shù)部分
# Final batch norm
self.features.add_module('norm5', nn.BatchNorm2d(num_features))
# Linear layer
self.classifier = nn.Linear(num_features, num_classes)
# Official init from torch repo.
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.constant_(m.bias, 0)
def forward(self, x):
features = self.features(x) # 特征提取層
out = F.relu(features, inplace=True)
out = F.adaptive_avg_pool2d(out, (1, 1)) # 自適應(yīng)均值池化茎活,輸出大小為(1昙沦,1)
out = torch.flatten(out, 1)
out = self.classifier(out) # 分類器
return out
# def _load_state_dict(model, model_url, progress):
# # '.'s are no longer allowed in module names, but previous _DenseLayer
# # has keys 'norm.1', 'relu.1', 'conv.1', 'norm.2', 'relu.2', 'conv.2'.
# # They are also in the checkpoints in model_urls. This pattern is used
# # to find such keys.
# pattern = re.compile(
# r'^(.*denselayer\d+\.(?:norm|relu|conv))\.((?:[12])\.(?:weight|bias|running_mean|running_var))$')
#
# state_dict = load_state_dict_from_url(model_url, progress=progress)
# for key in list(state_dict.keys()):
# res = pattern.match(key)
# if res:
# new_key = res.group(1) + res.group(2)
# state_dict[new_key] = state_dict[key]
# del state_dict[key]
# model.load_state_dict(state_dict)
def _densenet(arch, growth_rate, block_config, num_init_features, pretrained, progress,
**kwargs):
model = DenseNet(growth_rate, block_config, num_init_features, **kwargs)
if pretrained:
_load_state_dict(model, model_urls[arch], progress)
return model
def densenet121(pretrained=False, progress=True, **kwargs):
r"""Densenet-121 model from
`"Densely Connected Convolutional Networks" <https://arxiv.org/pdf/1608.06993.pdf>`_
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
memory_efficient (bool) - If True, uses checkpointing. Much more memory efficient,
but slower. Default: *False*. See `"paper" <https://arxiv.org/pdf/1707.06990.pdf>`_
"""
return _densenet('densenet121', 32, (6, 12, 24, 16), 64, pretrained, progress,
**kwargs)
整合以上過(guò)程
#model.py
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.checkpoint as cp
from collections import OrderedDict
#from .utils import load_state_dict_from_url
__all__ = ['DenseNet', 'densenet121', 'densenet169', 'densenet201', 'densenet161']
class _DenseLayer(nn.Sequential):
def __init__(self, num_input_features, growth_rate, bn_size, drop_rate, memory_efficient=False):#num_input_features特征層數(shù)
super(_DenseLayer, self).__init__()#growth_rate=32增長(zhǎng)率 bn_size=4
#(56 * 56 * 64)
self.add_module('norm1', nn.BatchNorm2d(num_input_features)),
self.add_module('relu1', nn.ReLU(inplace=True)),
self.add_module('conv1', nn.Conv2d(num_input_features, bn_size *
growth_rate, kernel_size=1, stride=1,
bias=False)),
self.add_module('norm2', nn.BatchNorm2d(bn_size * growth_rate)),
self.add_module('relu2', nn.ReLU(inplace=True)),
self.add_module('conv2', nn.Conv2d(bn_size * growth_rate, growth_rate,
kernel_size=3, stride=1, padding=1,
bias=False)),
#(56 * 56 * 32)
self.drop_rate = drop_rate
self.memory_efficient = memory_efficient
def forward(self, *prev_features):
bn_function = _bn_function_factory(self.norm1, self.relu1, self.conv1)#(56 * 56 * 64*3)
if self.memory_efficient and any(prev_feature.requires_grad for prev_feature in prev_features):
bottleneck_output = cp.checkpoint(bn_function, *prev_features)
else:
bottleneck_output = bn_function(*prev_features)
# bn1 + relu1 + conv1
new_features = self.conv2(self.relu2(self.norm2(bottleneck_output)))
if self.drop_rate > 0:
new_features = F.dropout(new_features, p=self.drop_rate,
training=self.training)
return new_features
def _bn_function_factory(norm, relu, conv):
def bn_function(*inputs):
# type(List[Tensor]) -> Tensor
concated_features = torch.cat(inputs, 1)#按通道合并
# bn1 + relu1 + conv1
bottleneck_output = conv(relu(norm(concated_features)))
return bottleneck_output
return bn_function
class _DenseBlock(nn.Module):
def __init__(self, num_layers, num_input_features, bn_size, growth_rate, drop_rate, memory_efficient=False):
super(_DenseBlock, self).__init__()#num_layers層重復(fù)次數(shù)
for i in range(num_layers):
layer = _DenseLayer(
num_input_features + i * growth_rate, #for一次層數(shù)增加32
growth_rate=growth_rate,
bn_size=bn_size,
drop_rate=drop_rate,
memory_efficient=memory_efficient,
)
self.add_module('denselayer%d' % (i + 1), layer) # 追加denselayer層到字典里面
def forward(self, init_features):
features = [init_features] #原來(lái)的特征,64
for name, layer in self.named_children(): # 依次遍歷添加的6個(gè)layer層载荔,
new_features = layer(*features) #計(jì)算特征
features.append(new_features) # 追加特征
return torch.cat(features, 1) # 按通道數(shù)合并特征
class _Transition(nn.Sequential):
def __init__(self, num_input_features, num_output_features):
super(_Transition, self).__init__()
self.add_module('norm', nn.BatchNorm2d(num_input_features))
self.add_module('relu', nn.ReLU(inplace=True))
self.add_module('conv', nn.Conv2d(num_input_features, num_output_features,
kernel_size=1, stride=1, bias=False))
self.add_module('pool', nn.AvgPool2d(kernel_size=2, stride=2))
class DenseNet(nn.Module):
r"""Densenet-BC model class, based on
`"Densely Connected Convolutional Networks" <https://arxiv.org/pdf/1608.06993.pdf>`_
Args:
growth_rate (int) - how many filters to add each layer (`k` in paper)
block_config (list of 4 ints) - how many layers in each pooling block
num_init_features (int) - the number of filters to learn in the first convolution layer
bn_size (int) - multiplicative factor for number of bottle neck layers
(i.e. bn_size * k features in the bottleneck layer)
drop_rate (float) - dropout rate after each dense layer
num_classes (int) - number of classification classes
memory_efficient (bool) - If True, uses checkpointing. Much more memory efficient,
but slower. Default: *False*. See `"paper" <https://arxiv.org/pdf/1707.06990.pdf>`_
"""
def __init__(self, growth_rate=32, block_config=(6, 12, 24, 16),
num_init_features=64, bn_size=4, drop_rate=0, num_classes=1000, memory_efficient=False):
super(DenseNet, self).__init__()
# First convolution
self.features = nn.Sequential(OrderedDict([
('conv0', nn.Conv2d(3, num_init_features, kernel_size=7, stride=2,
padding=3, bias=False)),
('norm0', nn.BatchNorm2d(num_init_features)),
('relu0', nn.ReLU(inplace=True)),
('pool0', nn.MaxPool2d(kernel_size=3, stride=2, padding=1)),
]))
# Each denseblock
num_features = num_init_features
for i, num_layers in enumerate(block_config):
block = _DenseBlock(
num_layers=num_layers, # 層數(shù)重復(fù)次數(shù)
num_input_features=num_features, # 特征層數(shù)64
bn_size=bn_size,
growth_rate=growth_rate,
drop_rate=drop_rate, # dropout值 0
memory_efficient=memory_efficient
)
self.features.add_module('denseblock%d' % (i + 1), block) # 追加denseblock
num_features = num_features + num_layers * growth_rate # 更新num_features=64+6*32 = 256
if i != len(block_config) - 1:#每?jī)蓚€(gè)dense block之間增加一個(gè)過(guò)渡層盾饮,i != (4-1),即 i != 3 非最后一個(gè)denseblock,后面跟_Transition層
trans = _Transition(num_input_features=num_features,
num_output_features=num_features // 2) # 輸出通道數(shù)減半
self.features.add_module('transition%d' % (i + 1), trans)
num_features = num_features // 2 # 更新num_features= num_features//2 取整數(shù)部分
# Final batch norm
self.features.add_module('norm5', nn.BatchNorm2d(num_features))
# Linear layer
self.classifier = nn.Linear(num_features, num_classes)
# Official init from torch repo.
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.constant_(m.bias, 0)
def forward(self, x):
features = self.features(x) # 特征提取層
out = F.relu(features, inplace=True)
out = F.adaptive_avg_pool2d(out, (1, 1)) # 自適應(yīng)均值池化丘损,輸出大小為(1普办,1)
out = torch.flatten(out, 1)
out = self.classifier(out) # 分類器
return out
# def _load_state_dict(model, model_url, progress):
# # '.'s are no longer allowed in module names, but previous _DenseLayer
# # has keys 'norm.1', 'relu.1', 'conv.1', 'norm.2', 'relu.2', 'conv.2'.
# # They are also in the checkpoints in model_urls. This pattern is used
# # to find such keys.
# pattern = re.compile(
# r'^(.*denselayer\d+\.(?:norm|relu|conv))\.((?:[12])\.(?:weight|bias|running_mean|running_var))$')
#
# state_dict = load_state_dict_from_url(model_url, progress=progress)
# for key in list(state_dict.keys()):
# res = pattern.match(key)
# if res:
# new_key = res.group(1) + res.group(2)
# state_dict[new_key] = state_dict[key]
# del state_dict[key]
# model.load_state_dict(state_dict)
def _densenet(arch, growth_rate, block_config, num_init_features, pretrained, progress,
**kwargs):
model = DenseNet(growth_rate, block_config, num_init_features, **kwargs)
if pretrained:
_load_state_dict(model, model_urls[arch], progress)
return model
def densenet121(pretrained=False, progress=True, **kwargs):
r"""Densenet-121 model from
`"Densely Connected Convolutional Networks" <https://arxiv.org/pdf/1608.06993.pdf>`_
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
memory_efficient (bool) - If True, uses checkpointing. Much more memory efficient,
but slower. Default: *False*. See `"paper" <https://arxiv.org/pdf/1707.06990.pdf>`_
"""
return _densenet('densenet121', 32, (6, 12, 24, 16), 64, pretrained, progress,
**kwargs)
def densenet161(pretrained=False, progress=True, **kwargs):
r"""Densenet-161 model from
`"Densely Connected Convolutional Networks" <https://arxiv.org/pdf/1608.06993.pdf>`_
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
memory_efficient (bool) - If True, uses checkpointing. Much more memory efficient,
but slower. Default: *False*. See `"paper" <https://arxiv.org/pdf/1707.06990.pdf>`_
"""
return _densenet('densenet161', 48, (6, 12, 36, 24), 96, pretrained, progress,
**kwargs)
def densenet169(pretrained=False, progress=True, **kwargs):
r"""Densenet-169 model from
`"Densely Connected Convolutional Networks" <https://arxiv.org/pdf/1608.06993.pdf>`_
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
memory_efficient (bool) - If True, uses checkpointing. Much more memory efficient,
but slower. Default: *False*. See `"paper" <https://arxiv.org/pdf/1707.06990.pdf>`_
"""
return _densenet('densenet169', 32, (6, 12, 32, 32), 64, pretrained, progress,
**kwargs)
def densenet201(pretrained=False, progress=True, **kwargs):
r"""Densenet-201 model from
`"Densely Connected Convolutional Networks" <https://arxiv.org/pdf/1608.06993.pdf>`_
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
memory_efficient (bool) - If True, uses checkpointing. Much more memory efficient,
but slower. Default: *False*. See `"paper" <https://arxiv.org/pdf/1707.06990.pdf>`_
"""
return _densenet('densenet201', 32, (6, 12, 48, 32), 64, pretrained, progress,
**kwargs)
注:本次訓(xùn)練集下載在AlexNet博客有詳細(xì)解說(shuō):https://blog.csdn.net/weixin_44023658/article/details/105798326
#train.py
import torch
import torch.nn as nn
from torchvision import transforms, datasets
import json
import matplotlib.pyplot as plt
from model import densenet121
import os
import torch.optim as optim
import torchvision.models.densenet
import torchvision.models as models
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
data_transform = {
"train": transforms.Compose([transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),#來(lái)自官網(wǎng)參數(shù)
"val": transforms.Compose([transforms.Resize(256),#將最小邊長(zhǎng)縮放到256
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])}
data_root = os.path.abspath(os.path.join(os.getcwd(), "../../..")) # get data root path
image_path = data_root + "/data_set/flower_data/" # flower data set path
train_dataset = datasets.ImageFolder(root=image_path + "train",
transform=data_transform["train"])
train_num = len(train_dataset)
# {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4}
flower_list = train_dataset.class_to_idx
cla_dict = dict((val, key) for key, val in flower_list.items())
# write dict into json file
json_str = json.dumps(cla_dict, indent=4)
with open('class_indices.json', 'w') as json_file:
json_file.write(json_str)
batch_size = 16
train_loader = torch.utils.data.DataLoader(train_dataset,
batch_size=batch_size, shuffle=True,
num_workers=0)
validate_dataset = datasets.ImageFolder(root=image_path + "/val",
transform=data_transform["val"])
val_num = len(validate_dataset)
validate_loader = torch.utils.data.DataLoader(validate_dataset,
batch_size=batch_size, shuffle=False,
num_workers=0)
#遷移學(xué)習(xí)
net = models.densenet121(pretrained=False)
model_weight_path="./densenet121-a.pth"
missing_keys, unexpected_keys = net.load_state_dict(torch.load(model_weight_path), strict= False)
inchannel = net.classifier.in_features
net.classifier = nn.Linear(inchannel, 5)
net.to(device)
loss_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.0001)
#普通
# net = densenet121(pretrained=False)
# net.to(device)
# inchannel = net.classifier.in_features
# net.classifier = nn.Linear(inchannel, 5)
#
# loss_function = nn.CrossEntropyLoss()
# optimizer = optim.Adam(net.parameters(), lr=0.0001)
best_acc = 0.0
save_path = './densenet121.pth'
for epoch in range(10):
# train
net.train()
running_loss = 0.0
for step, data in enumerate(train_loader, start=0):
images, labels = data
optimizer.zero_grad()
logits = net(images.to(device))
loss = loss_function(logits, labels.to(device))
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
# print train process
rate = (step+1)/len(train_loader)
a = "*" * int(rate * 50)
b = "." * int((1 - rate) * 50)
print("\rtrain loss: {:^3.0f}%[{}->{}]{:.4f}".format(int(rate*100), a, b, loss), end="")
print()
# validate
net.eval()
acc = 0.0 # accumulate accurate number / epoch
with torch.no_grad():
for val_data in validate_loader:
val_images, val_labels = val_data
outputs = net(val_images.to(device)) # eval model only have last output layer
# loss = loss_function(outputs, test_labels)
predict_y = torch.max(outputs, dim=1)[1]
acc += (predict_y == val_labels.to(device)).sum().item()
val_accurate = acc / val_num
if val_accurate > best_acc:
best_acc = val_accurate
torch.save(net.state_dict(), save_path)
print('[epoch %d] train_loss: %.3f test_accuracy: %.3f' %
(epoch + 1, running_loss / step, val_accurate))
print('Finished Training')
#predict.py
import torch
from model import densenet121
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt
import json
data_transform = transforms.Compose(
[transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
# load image
img = Image.open("./roses.jpg")
plt.imshow(img)
# [N, C, H, W]
img = data_transform(img)
# expand batch dimension
img = torch.unsqueeze(img, dim=0)
# read class_indict
try:
json_file = open('./class_indices.json', 'r')
class_indict = json.load(json_file)
except Exception as e:
print(e)
exit(-1)
# create model
model = densenet121(num_classes=5)
# load model weights
model_weight_path = "./densenet121.pth"
model.load_state_dict(torch.load(model_weight_path))
model.eval()
with torch.no_grad():
# predict class
output = torch.squeeze(model(img))
predict = torch.softmax(output, dim=0)
predict_cla = torch.argmax(predict).numpy()
print(class_indict[str(predict_cla)], predict[predict_cla].numpy())
plt.show()
DenseNet
循環(huán)Dense Block和Transition與展平softmax
參考自:
AI之路