唉翠桦。
人生如夢(mèng),回首過(guò)往竟有青春逝去不再來(lái)胳蛮,美好時(shí)光不再有的感覺(jué)销凑。匆匆忙忙的我們竟然走過(guò)了這么多荊棘與坎坷,是否還能不忘初心擁有一顆純凈的心靈仅炊。年輕時(shí)候斗幼,最喜歡蔣捷的《聽雨》:
少年聽雨歌樓上,紅燭昏羅帳抚垄。
壯年聽雨客舟中蜕窿,江闊云低斷雁叫西風(fēng)。
而今聽雨僧廬下呆馁,鬢已星星星也桐经。
悲歡離合總無(wú)情,一任階前智哀、點(diǎn)滴到天明次询。
深度學(xué)習(xí)領(lǐng)域,卷積神經(jīng)網(wǎng)絡(luò)(Convolutional Neural Networks瓷叫,簡(jiǎn)稱CNN)在圖像識(shí)別中取發(fā)揮了重要作用屯吊,CNN發(fā)展到今天已有很多變種,其中有幾個(gè)經(jīng)典模型在CNN發(fā)展歷程中有著里程碑的意義摹菠,它們分別是:LeNet盒卸、AlexNet、Googlenet次氨、VGG蔽介、ResNet等,接下來(lái)將進(jìn)行逐一介紹煮寡,并給出keras的簡(jiǎn)單實(shí)現(xiàn)虹蓄。
LeNet
LeNet利用卷積、參數(shù)共享幸撕、池化等操作提取特征(這是卷積網(wǎng)絡(luò)的共同特點(diǎn))薇组,避免了大量的計(jì)算成本,最后再使用全連接神經(jīng)網(wǎng)絡(luò)進(jìn)行分類識(shí)別坐儿,這個(gè)網(wǎng)絡(luò)也是之后大量神經(jīng)網(wǎng)絡(luò)架構(gòu)的起點(diǎn)律胀,給這個(gè)領(lǐng)域帶來(lái)了許多靈感宋光。
下面以手寫字體識(shí)別網(wǎng)絡(luò)LeNet5進(jìn)行介紹,在上圖中炭菌,不包含輸入層罪佳,LeNet由7層CNN組成,輸入的原始圖像大小是32×32像素黑低,Ci表示卷積層赘艳,Si表示池化操作的子采樣層,F(xiàn)i表示全連接層投储。
- C1層:
6@28×28:該層使用了6個(gè)卷積核第练,每個(gè)卷積核的大小為5×5,這樣就得到了6個(gè)feature map(特征圖)大小為(32-5+1)×(32-5+1)= 28×28玛荞,沒(méi)有padding操作,步長(zhǎng)為1呕寝。參數(shù)個(gè)數(shù)為(5×5+1)×6= 156勋眯,其中5×5為卷積核參數(shù),1為偏置參數(shù)下梢;卷積后的圖像大小為28×28客蹋,因此每個(gè)特征圖有28×28個(gè)神經(jīng)元,每個(gè)卷積核參數(shù)為(5×5+1)×6孽江,因此讶坯,該層的連接數(shù)為(5×5+1)×6×28×28=122304。 - S2層(下采樣岗屏,池化層):
6@14×14:該層主要是做池化或者特征映射(特征降維)辆琅,池化單元為2×2,因此这刷,6個(gè)特征圖的大小經(jīng)池化后即變?yōu)?28/2)即14×14(池化單元之間沒(méi)有重疊婉烟,在池化區(qū)域內(nèi)進(jìn)行聚合統(tǒng)計(jì)后得到新的特征值,因此經(jīng)2×2池化后暇屋,每?jī)尚袃闪兄匦滤愠鲆粋€(gè)特征值出來(lái)似袁,相當(dāng)于圖像大小減半,因此卷積后的28×28圖像經(jīng)2×2池化后就變?yōu)?4×14)
這一層的計(jì)算過(guò)程是:2×2 單元里的值相加咐刨,然后再乘以訓(xùn)練參數(shù)w昙衅,再加上一個(gè)偏置參數(shù)b(每一個(gè)特征圖共享相同的w和b),然后取sigmoid值(S函數(shù):0-1區(qū)間)定鸟,作為對(duì)應(yīng)的該單元的值而涉。卷積操作與池化的示意圖如下:
S2層由于每個(gè)特征圖都共享相同的w和b這兩個(gè)參數(shù),因此需要2×6=12個(gè)參數(shù)仔粥;下采樣之后的圖像大小為14×14婴谱,因此S2層的每個(gè)特征圖有14×14個(gè)神經(jīng)元蟹但,每個(gè)池化單元連接數(shù)為2×2+1(1為偏置量),因此谭羔,該層的連接數(shù)為(2×2+1)×14×14×6 = 5880 -
C3層:
16@10×10:C3層有16個(gè)卷積核华糖,卷積模板大小為5×5;特征圖大小為(14-5+1)×(14-5+1)= 10×10瘟裸;需要注意的是客叉,C3與S2并不是全連接而是部分連接,有些是C3連接到S2三層话告、有些四層兼搏、甚至達(dá)到6層,通過(guò)這種方式提取更多特征沙郭,連接的規(guī)則如下表所示:
C3層的參數(shù)數(shù)目為(5×5×3+1)×6 +(5×5×4+1)×9 +5×5×6+1 = 1516佛呻;卷積后的特征圖大小為10×10,參數(shù)數(shù)量為1516病线,因此連接數(shù)為1516×10×10= 151600吓著。
- S4(下采樣層,也稱池化層):16@5×5
池化單元大小為2×2送挑,因此绑莺,該層與C3一樣共有16個(gè)特征圖,每個(gè)特征圖的大小為5×5惕耕;所需要參數(shù)個(gè)數(shù)為16×2 = 32纺裁;連接數(shù)為(2×2+1)×5×5×16 = 2000; - C5層(卷積層):120
該層有120個(gè)卷積核司澎,每個(gè)卷積核的大小仍為5×5欺缘,因此有120個(gè)特征圖。由于S4層的大小為5×5惭缰,而該層的卷積核大小也是5×5浪南,因此特征圖大小為(5-5+1)×(5-5+1)= 1×1。這樣該層就剛好變成了全連接漱受,這只是巧合络凿,如果原始輸入的圖像比較大,則該層就不是全連接了昂羡,不過(guò)這種方式是可以代替全連接的絮记,后續(xù)的網(wǎng)絡(luò)結(jié)構(gòu)有這種實(shí)現(xiàn)。本層的參數(shù)數(shù)目為120×(5×5×16+1) = 48120虐先;由于該層的特征圖大小剛好為1×1怨愤,因此連接數(shù)為48120×1×1=48120。 -
F6層(全連接層):84
F6層有84個(gè)單元蛹批,之所以選這個(gè)數(shù)字的原因是來(lái)自于輸出層的設(shè)計(jì)撰洗,對(duì)應(yīng)于一個(gè)7×12的比特圖篮愉,如下圖所示,-1表示白色差导,1表示黑色试躏,這樣每個(gè)符號(hào)的比特圖的黑白色就對(duì)應(yīng)于一個(gè)編碼。
該層有84個(gè)特征圖设褐,特征圖大小與C5一樣都是1×1颠蕴,與C5層全連接。參數(shù)數(shù)量為(120+1)×84=10164助析。跟經(jīng)典神經(jīng)網(wǎng)絡(luò)一樣犀被,F(xiàn)6層計(jì)算輸入向量和權(quán)重向量之間的點(diǎn)積,再加上一個(gè)偏置外冀,然后將其傳遞給sigmoid函數(shù)得出結(jié)果寡键。由于是全連接,連接數(shù)與參數(shù)數(shù)量一樣锥惋,也是10164昌腰。
-
OUTPUT層(輸出層):10
Output層也是全連接層,共有10個(gè)節(jié)點(diǎn)膀跌,分別代表數(shù)字0到9。如果第i個(gè)節(jié)點(diǎn)的值為0固灵,則表示網(wǎng)絡(luò)識(shí)別的結(jié)果是數(shù)字i捅伤。
該層采用徑向基函數(shù)(RBF)的網(wǎng)絡(luò)連接方式,假設(shè)x是上一層的輸入巫玻,y是RBF的輸出丛忆,則RBF輸出的計(jì)算方式是:
上式中的Wij的值由i的比特圖編碼確定,i從0到9仍秤,j取值從0到7×12-1熄诡。RBF輸出的值越接近于0,表示當(dāng)前網(wǎng)絡(luò)輸入的識(shí)別結(jié)果與字符i越接近诗力。
參數(shù)個(gè)數(shù)為84×10=840凰浮;連接數(shù)與參數(shù)個(gè)數(shù)一樣,也是840
下圖是識(shí)別數(shù)字3的過(guò)程:
基于keras可實(shí)現(xiàn)如下:
import numpy as np
import keras
from keras.datasets import mnist
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Activation, Conv2D, MaxPooling2D, Flatten
from keras.optimizers import Adam
#load the MNIST dataset from keras datasets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
#Process data
X_train = X_train.reshape(-1, 28, 28, 1) # Expend dimension for 1 cahnnel image
X_test = X_test.reshape(-1, 28, 28, 1) # Expend dimension for 1 cahnnel image
X_train = X_train / 255 # Normalize
X_test = X_test / 255 # Normalize
#One hot encoding
y_train = np_utils.to_categorical(y_train, num_classes=10)
y_test = np_utils.to_categorical(y_test, num_classes=10)
#Build LetNet model with Keras
def LetNet(width, height, depth, classes):
# initialize the model
model = Sequential()
# first layer, convolution and pooling
model.add(Conv2D(input_shape=(width, height, depth), kernel_size=(5, 5), filters=6, strides=(1,1), activation='tanh'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# second layer, convolution and pooling
model.add(Conv2D(input_shape=(width, height, depth), kernel_size=(5, 5), filters=16, strides=(1,1), activation='tanh'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# Fully connection layer
model.add(Flatten())
model.add(Dense(120,activation = 'tanh'))
model.add(Dense(84,activation = 'tanh'))
# softmax classifier
model.add(Dense(classes))
model.add(Activation("softmax"))
return model
LetNet_model = LetNet(28,28,1,10)
LetNet_model.summary()
LetNet_model.compile(optimizer=Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08),loss = 'categorical_crossentropy',metrics=['accuracy'])
#Strat training
History = LetNet_model.fit(X_train, y_train, epochs=5, batch_size=32,validation_data=(X_test, y_test))
#Plot Loss and accuracy
import matplotlib.pyplot as plt
plt.figure(figsize = (15,5))
plt.subplot(1,2,1)
plt.plot(History.history['acc'])
plt.plot(History.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.subplot(1,2,2)
plt.plot(History.history['loss'])
plt.plot(History.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
plt.show()
AlexNet
AlexNet有幾個(gè)關(guān)鍵的改進(jìn)點(diǎn)苇本,也為后續(xù)的卷積網(wǎng)絡(luò)架構(gòu)設(shè)計(jì)奠定了基礎(chǔ)袜茧。
-
使用了非線性激活函數(shù):ReLU
傳統(tǒng)的神經(jīng)網(wǎng)絡(luò)普遍使用Sigmoid或者tanh等非線性函數(shù)作為激勵(lì)函數(shù),然而它們?nèi)菀壮霈F(xiàn)梯度彌散或梯度飽和的情況瓣窄。以Sigmoid函數(shù)為例笛厦,當(dāng)輸入的值非常大或者非常小的時(shí)候,這些神經(jīng)元的梯度接近于0(梯度飽和現(xiàn)象)俺夕,如果輸入的初始值很大的話裳凸,梯度在反向傳播時(shí)因?yàn)樾枰松弦粋€(gè)Sigmoid導(dǎo)數(shù)贱鄙,會(huì)造成梯度越來(lái)越小,導(dǎo)致網(wǎng)絡(luò)變的很難學(xué)習(xí)姨谷。
在AlexNet中逗宁,使用了ReLU (Rectified Linear Units)激勵(lì)函數(shù),該函數(shù)的公式為:f(x)=max(0,x)菠秒,當(dāng)輸入信號(hào)<0時(shí)疙剑,輸出都是0,當(dāng)輸入信號(hào)>0時(shí)践叠,輸出等于輸入言缤,如下圖所示:
使用ReLU替代Sigmoid/tanh,由于ReLU是線性的禁灼,且導(dǎo)數(shù)始終為1管挟,計(jì)算量大大減少,收斂速度會(huì)比Sigmoid/tanh快很多弄捕。
- Data augmentation(數(shù)據(jù)擴(kuò)充)
有一種觀點(diǎn)認(rèn)為神經(jīng)網(wǎng)絡(luò)是靠數(shù)據(jù)喂出來(lái)的僻孝,如果能夠增加訓(xùn)練數(shù)據(jù),提供海量數(shù)據(jù)進(jìn)行訓(xùn)練守谓,則能夠有效提升算法的準(zhǔn)確率穿铆,因?yàn)檫@樣可以避免過(guò)擬合,從而可以進(jìn)一步增大斋荞、加深網(wǎng)絡(luò)結(jié)構(gòu)荞雏。而當(dāng)訓(xùn)練數(shù)據(jù)有限時(shí),可以通過(guò)一些變換從已有的訓(xùn)練數(shù)據(jù)集中生成一些新的數(shù)據(jù)平酿,以快速地?cái)U(kuò)充訓(xùn)練數(shù)據(jù)凤优。其中,最簡(jiǎn)單蜈彼、通用的圖像數(shù)據(jù)變形的方式:水平翻轉(zhuǎn)圖像筑辨,從原始圖像中隨機(jī)裁剪、平移變換幸逆,顏色棍辕、光照變換等。
AlexNet在訓(xùn)練時(shí)秉颗,在數(shù)據(jù)擴(kuò)充(data augmentation)這樣處理:
(1)隨機(jī)裁剪痢毒,對(duì)256×256的圖片進(jìn)行隨機(jī)裁剪到224×224,然后進(jìn)行水平翻轉(zhuǎn)蚕甥,相當(dāng)于將樣本數(shù)量增加了((256-224)^2)×2=2048倍哪替;
(2)測(cè)試的時(shí)候,對(duì)左上菇怀、右上凭舶、左下晌块、右下、中間分別做了5次裁剪帅霜,然后翻轉(zhuǎn)匆背,共10個(gè)裁剪,之后對(duì)結(jié)果求平均身冀。作者說(shuō)钝尸,如果不做隨機(jī)裁剪,大網(wǎng)絡(luò)基本上都過(guò)擬合搂根;
(3)對(duì)RGB空間做PCA(主成分分析)珍促,然后對(duì)主成分做一個(gè)(0, 0.1)的高斯擾動(dòng),也就是對(duì)顏色剩愧、光照作變換猪叙,結(jié)果使錯(cuò)誤率又下降了1%。 - 重疊池化 (Overlapping Pooling)
在AlexNet中使用的池化(Pooling)卻是可重疊的仁卷,也就是說(shuō)穴翩,在池化的時(shí)候,每次移動(dòng)的步長(zhǎng)小于池化的窗口長(zhǎng)度锦积。AlexNet池化的大小為3×3的正方形芒帕,每次池化移動(dòng)步長(zhǎng)為2,這樣就會(huì)出現(xiàn)重疊丰介。重疊池化可以避免過(guò)擬合副签,這個(gè)策略貢獻(xiàn)了0.3%的Top-5錯(cuò)誤率。 - 局部歸一化(Local Response Normalization基矮,簡(jiǎn)稱LRN)
- Dropout
不言而喻,現(xiàn)在的深度學(xué)習(xí)網(wǎng)絡(luò)架構(gòu)中冠场,Dropout幾乎是抑制過(guò)擬合的標(biāo)配家浇。 -
多GPU訓(xùn)練
AlexNet當(dāng)時(shí)使用了GTX580的GPU進(jìn)行訓(xùn)練,由于單個(gè)GTX 580 GPU只有3GB內(nèi)存碴裙,這限制了在其上訓(xùn)練的網(wǎng)絡(luò)的最大規(guī)模钢悲,因此他們?cè)诿總€(gè)GPU中放置一半核(或神經(jīng)元),將網(wǎng)絡(luò)分布在兩個(gè)GPU上進(jìn)行并行計(jì)算舔株,大大加快了AlexNet的訓(xùn)練速度莺琳。以至于其網(wǎng)絡(luò)結(jié)構(gòu)是這樣的:
其實(shí)也可以弄成一層網(wǎng)絡(luò)進(jìn)行訓(xùn)練,比如下圖载慈,和LeNet比較類似:
AlexNet網(wǎng)絡(luò)結(jié)構(gòu)共有8層惭等,前面5層是卷積層,后面3層是全連接層办铡,最后一個(gè)全連接層的輸出傳遞給一個(gè)1000路的softmax層辞做,對(duì)應(yīng)1000個(gè)類標(biāo)簽的分布琳要。以下為在開源數(shù)據(jù)dataset oxflower17上運(yùn)行的AlexNet實(shí)現(xiàn):
import numpy as np
import keras
from keras.datasets import mnist
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Activation, Conv2D, MaxPooling2D, Flatten,Dropout
from keras.optimizers import Adam
#Load oxflower17 dataset
import tflearn.datasets.oxflower17 as oxflower17
from sklearn.model_selection import train_test_split
x, y = oxflower17.load_data(one_hot=True)
#Split train and test data
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2,shuffle = True)
#Data augumentation with Keras tools
from keras.preprocessing.image import ImageDataGenerator
img_gen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True
)
#Build AlexNet model
def AlexNet(width, height, depth, classes):
model = Sequential()
#First Convolution and Pooling layer
model.add(Conv2D(96,(11,11),strides=(4,4),input_shape=(width,height,depth),padding='valid',activation='relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
#Second Convolution and Pooling layer
model.add(Conv2D(256,(5,5),strides=(1,1),padding='same',activation='relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
#Three Convolution layer and Pooling Layer
model.add(Conv2D(384,(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(Conv2D(384,(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(Conv2D(256,(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
#Fully connection layer
model.add(Flatten())
model.add(Dense(4096,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(4096,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1000,activation='relu'))
model.add(Dropout(0.5))
#Classfication layer
model.add(Dense(classes,activation='softmax'))
return model
AlexNet_model = AlexNet(224,224,3,17)
AlexNet_model.summary()
AlexNet_model.compile(optimizer=Adam(lr=0.00001, beta_1=0.9, beta_2=0.999, epsilon=1e-08),loss = 'categorical_crossentropy',metrics=['accuracy'])
#Start training using dataaugumentation generator
History = AlexNet_model.fit_generator(img_gen.flow(X_train*255, y_train, batch_size = 16),
steps_per_epoch = len(X_train)/16, validation_data = (X_test,y_test), epochs = 30 )
#Plot Loss and Accuracy
import matplotlib.pyplot as plt
plt.figure(figsize = (15,5))
plt.subplot(1,2,1)
plt.plot(History.history['acc'])
plt.plot(History.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.subplot(1,2,2)
plt.plot(History.history['loss'])
plt.plot(History.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
plt.show()
VGGNet
2014年,牛津大學(xué)計(jì)算機(jī)視覺(jué)組(Visual Geometry Group)和Google DeepMind公司的研究員一起研發(fā)出了新的深度卷積神經(jīng)網(wǎng)絡(luò):VGGNet秤茅,并取得了ILSVRC2014比賽分類項(xiàng)目的第二名(第一名是GoogLeNet稚补,也是同年提出的)和定位項(xiàng)目的第一名。
VGGNet探索了卷積神經(jīng)網(wǎng)絡(luò)的深度與其性能之間的關(guān)系框喳,成功地構(gòu)筑了16~19層深的卷積神經(jīng)網(wǎng)絡(luò)课幕,證明了增加網(wǎng)絡(luò)的深度能夠在一定程度上影響網(wǎng)絡(luò)最終的性能,使錯(cuò)誤率大幅下降五垮,同時(shí)拓展性又很強(qiáng)乍惊,遷移到其它圖片數(shù)據(jù)上的泛化性也非常好。到目前為止拼余,VGG仍然被用來(lái)提取圖像特征污桦。
VGGNet可以看成是加深版本的AlexNet,都是由卷積層匙监、全連接層兩大部分構(gòu)成如下圖示凡橱。
其特點(diǎn)是:
1. 結(jié)構(gòu)簡(jiǎn)明
VGG由5層卷積層、3層全連接層亭姥、softmax輸出層構(gòu)成含蓉,層與層之間使用max-pooling(最大化池)分開,所有隱層的激活單元都采用ReLU函數(shù)汪厨。
2. 小卷積核和多卷積子層
VGG使用多個(gè)較小卷積核(3x3)的卷積層代替一個(gè)卷積核較大的卷積層俐筋,一方面可以減少參數(shù),另一方面相當(dāng)于進(jìn)行了更多的非線性映射粮揉,可以增加網(wǎng)絡(luò)的擬合/表達(dá)能力巡李。
小卷積核是VGG的一個(gè)重要特點(diǎn),雖然VGG是在模仿AlexNet的網(wǎng)絡(luò)結(jié)構(gòu)扶认,但沒(méi)有采用AlexNet中比較大的卷積核尺寸(如7x7)侨拦,而是通過(guò)降低卷積核的大小(3x3)辐宾,增加卷積子層數(shù)來(lái)達(dá)到同樣的性能(VGG:從1到4卷積子層狱从,AlexNet:1子層)。
VGG的作者認(rèn)為兩個(gè)3x3的卷積堆疊獲得的感受野大小叠纹,相當(dāng)一個(gè)5x5的卷積季研;而3個(gè)3x3卷積的堆疊獲取到的感受野相當(dāng)于一個(gè)7x7的卷積。這樣可以增加非線性映射誉察,也能很好地減少參數(shù)(例如7x7的參數(shù)為49個(gè)与涡,而3個(gè)3x3的參數(shù)為27)。
3.小池化核
相比AlexNet的3x3的池化核,VGG全部采用2x2的池化核递沪。
4. 通道數(shù)多
VGG網(wǎng)絡(luò)第一層的通道數(shù)為64豺鼻,后面每層都進(jìn)行了翻倍,最多到512個(gè)通道款慨,通道數(shù)的增加儒飒,使得更多的信息可以被提取出來(lái)。
5 .層數(shù)更深檩奠、特征圖更寬
由于卷積核專注于擴(kuò)大通道數(shù)桩了、池化專注于縮小寬和高,使得模型架構(gòu)上更深更寬的同時(shí)埠戳,控制了計(jì)算量的增加規(guī)模井誉。
6、全連接轉(zhuǎn)卷積(測(cè)試階段)
這也是VGG的一個(gè)特點(diǎn)整胃,在網(wǎng)絡(luò)測(cè)試階段將訓(xùn)練階段的三個(gè)全連接替換為三個(gè)卷積颗圣,使得測(cè)試得到的全卷積網(wǎng)絡(luò)因?yàn)闆](méi)有全連接的限制,因而可以接收任意寬或高為的輸入屁使,這在測(cè)試階段很重要在岂。輸入圖像是224x224x3,如果后面三個(gè)層都是全連接蛮寂,那么在測(cè)試階段就只能將測(cè)試的圖像全部都要縮放大小到224x224x3蔽午,才能符合后面全連接層的輸入數(shù)量要求,這樣就不便于測(cè)試工作的開展酬蹋。
而“全連接轉(zhuǎn)卷積”及老,替換過(guò)程如下:
其卷積結(jié)構(gòu)可表示如下:
- 輸入224x224x3的圖片,經(jīng)64個(gè)3x3的卷積核作兩次卷積+ReLU范抓,卷積后的尺寸變?yōu)?24x224x64
- 作max pooling(最大化池化)骄恶,池化單元尺寸為2x2(效果為圖像尺寸減半),池化后的尺寸變?yōu)?12x112x64
- 經(jīng)128個(gè)3x3的卷積核作兩次卷積+ReLU匕垫,尺寸變?yōu)?12x112x128
- 作2x2的max pooling池化叠蝇,尺寸變?yōu)?6x56x128
- 經(jīng)256個(gè)3x3的卷積核作三次卷積+ReLU,尺寸變?yōu)?6x56x256
- 作2x2的max pooling池化年缎,尺寸變?yōu)?8x28x256
- 經(jīng)512個(gè)3x3的卷積核作三次卷積+ReLU,尺寸變?yōu)?8x28x512
- 作2x2的max pooling池化铃慷,尺寸變?yōu)?4x14x512
- 經(jīng)512個(gè)3x3的卷積核作三次卷積+ReLU单芜,尺寸變?yōu)?4x14x512
- 作2x2的max pooling池化,尺寸變?yōu)?x7x512
- 與兩層1x1x4096犁柜,一層1x1x1000進(jìn)行全連接+ReLU(共三層)
- 通過(guò)softmax輸出1000個(gè)預(yù)測(cè)結(jié)果
同樣基于數(shù)據(jù)集oxflower17實(shí)現(xiàn)VGG16Net:
import numpy as np
import keras
from keras.datasets import mnist
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Activation, Conv2D, MaxPooling2D, Flatten,Dropout
from keras.optimizers import Adam
#Load oxflower17 dataset
import tflearn.datasets.oxflower17 as oxflower17
from sklearn.model_selection import train_test_split
x, y = oxflower17.load_data(one_hot=True)
#Split train and test data
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2,shuffle = True)
#Data augumentation with Keras tools
from keras.preprocessing.image import ImageDataGenerator
img_gen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True
)
#Build VGG16Net model
def VGG16Net(width, height, depth, classes):
model = Sequential()
model.add(Conv2D(64,(3,3),strides=(1,1),input_shape=(224,224,3),padding='same',activation='relu'))
model.add(Conv2D(64,(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(128,(3,2),strides=(1,1),padding='same',activation='relu'))
model.add(Conv2D(128,(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(256,(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(Conv2D(256,(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(Conv2D(256,(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(512,(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(Conv2D(512,(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(Conv2D(512,(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(512,(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(Conv2D(512,(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(Conv2D(512,(3,3),strides=(1,1),padding='same',activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(4096,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(4096,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1000,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(17,activation='softmax'))
return model
VGG16_model = VGG16Net(224,224,3,17)
VGG16_model.summary()
VGG16_model.compile(optimizer=Adam(lr=0.00001, beta_1=0.9, beta_2=0.999, epsilon=1e-08),loss = 'categorical_crossentropy',metrics=['accuracy'])
#Start training using dataaugumentation generator
History = VGG16_model.fit_generator(img_gen.flow(X_train*255, y_train, batch_size = 16),
steps_per_epoch = len(X_train)/16, validation_data = (X_test,y_test), epochs = 30 )
#Plot Loss and Accuracy
import matplotlib.pyplot as plt
plt.figure(figsize = (15,5))
plt.subplot(1,2,1)
plt.plot(History.history['acc'])
plt.plot(History.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.subplot(1,2,2)
plt.plot(History.history['loss'])
plt.plot(History.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
plt.show()
要得到一個(gè)好的結(jié)果訓(xùn)練時(shí)間比較長(zhǎng)洲鸠,我就沒(méi)等它跑完了。代碼運(yùn)行是沒(méi)問(wèn)題的。
Inception(GoogLeNet)
GoogLeNet在2014年ILSVRC的分類比賽中拿到了第一名扒腕,GoogLeNet做了一個(gè)創(chuàng)新绢淀,他并不是像VGG或是AlexNet那種加深網(wǎng)絡(luò)的概念,而是加入了一個(gè)叫做Inception的結(jié)構(gòu)來(lái)取代原本單純的卷積層瘾腰,而他的訓(xùn)練參數(shù)也比AlexNet少上好幾倍皆的,而且準(zhǔn)確率相對(duì)更好,所以當(dāng)時(shí)才拿下了第一名蹋盆,一直到現(xiàn)在费薄,Inception已更新到InceptionV4。
而GoogLeNet最重要的就是Inception架構(gòu):
GoogleNet有以下幾種不同的地方:
- 將單純的卷積層和池化層改成Inception架構(gòu)
- 最后分類時(shí)使用average pooling代替了全連接層
- 網(wǎng)絡(luò)加入了兩個(gè)輔助分類器栖雾,為了避免梯度消失的情況
我們用InceptionV1論文中提到的這個(gè)Table實(shí)現(xiàn)GoogLeNet的網(wǎng)絡(luò)楞抡,上面一樣,都用dataset oxflower17進(jìn)行訓(xùn)練析藕。
import numpy as np
import keras
from keras.datasets import mnist
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Activation, Conv2D, MaxPooling2D, Flatten,Dropout,BatchNormalization,AveragePooling2D,concatenate,Input, concatenate
from keras.models import Model,load_model
from keras.optimizers import Adam
#Load oxflower17 dataset
import tflearn.datasets.oxflower17 as oxflower17
from sklearn.model_selection import train_test_split
x, y = oxflower17.load_data(one_hot=True)
#Split train and test data
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2,shuffle = True)
#Data augumentation with Keras tools
from keras.preprocessing.image import ImageDataGenerator
img_gen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True
)
#Define convolution with batchnromalization
def Conv2d_BN(x, nb_filter,kernel_size, padding='same',strides=(1,1),name=None):
if name is not None:
bn_name = name + '_bn'
conv_name = name + '_conv'
else:
bn_name = None
conv_name = None
x = Conv2D(nb_filter,kernel_size,padding=padding,strides=strides,activation='relu',name=conv_name)(x)
x = BatchNormalization(axis=3,name=bn_name)(x)
return x
#Define Inception structure
def Inception(x,nb_filter_para):
(branch1,branch2,branch3,branch4)= nb_filter_para
branch1x1 = Conv2D(branch1[0],(1,1), padding='same',strides=(1,1),name=None)(x)
branch3x3 = Conv2D(branch2[0],(1,1), padding='same',strides=(1,1),name=None)(x)
branch3x3 = Conv2D(branch2[1],(3,3), padding='same',strides=(1,1),name=None)(branch3x3)
branch5x5 = Conv2D(branch3[0],(1,1), padding='same',strides=(1,1),name=None)(x)
branch5x5 = Conv2D(branch3[1],(1,1), padding='same',strides=(1,1),name=None)(branch5x5)
branchpool = MaxPooling2D(pool_size=(3,3),strides=(1,1),padding='same')(x)
branchpool = Conv2D(branch4[0],(1,1),padding='same',strides=(1,1),name=None)(branchpool)
x = concatenate([branch1x1,branch3x3,branch5x5,branchpool],axis=3)
return x
#Build InceptionV1 model
def InceptionV1(width, height, depth, classes):
inpt = Input(shape=(width,height,depth))
x = Conv2d_BN(inpt,64,(7,7),strides=(2,2),padding='same')
x = MaxPooling2D(pool_size=(3,3),strides=(2,2),padding='same')(x)
x = Conv2d_BN(x,192,(3,3),strides=(1,1),padding='same')
x = MaxPooling2D(pool_size=(3,3),strides=(2,2),padding='same')(x)
x = Inception(x,[(64,),(96,128),(16,32),(32,)]) #Inception 3a 28x28x256
x = Inception(x,[(128,),(128,192),(32,96),(64,)]) #Inception 3b 28x28x480
x = MaxPooling2D(pool_size=(3,3),strides=(2,2),padding='same')(x) #14x14x480
x = Inception(x,[(192,),(96,208),(16,48),(64,)]) #Inception 4a 14x14x512
x = Inception(x,[(160,),(112,224),(24,64),(64,)]) #Inception 4a 14x14x512
x = Inception(x,[(128,),(128,256),(24,64),(64,)]) #Inception 4a 14x14x512
x = Inception(x,[(112,),(144,288),(32,64),(64,)]) #Inception 4a 14x14x528
x = Inception(x,[(256,),(160,320),(32,128),(128,)]) #Inception 4a 14x14x832
x = MaxPooling2D(pool_size=(3,3),strides=(2,2),padding='same')(x) #7x7x832
x = Inception(x,[(256,),(160,320),(32,128),(128,)]) #Inception 5a 7x7x832
x = Inception(x,[(384,),(192,384),(48,128),(128,)]) #Inception 5b 7x7x1024
#Using AveragePooling replace flatten
x = AveragePooling2D(pool_size=(7,7),strides=(7,7),padding='same')(x)
x =Flatten()(x)
x = Dropout(0.4)(x)
x = Dense(1000,activation='relu')(x)
x = Dense(classes,activation='softmax')(x)
model=Model(input=inpt,output=x)
return model
InceptionV1_model = InceptionV1(224,224,3,17)
InceptionV1_model.summary()
InceptionV1_model.compile(optimizer=Adam(lr=0.00001, beta_1=0.9, beta_2=0.999, epsilon=1e-08),loss = 'categorical_crossentropy',metrics=['accuracy'])
History = InceptionV1_model.fit_generator(img_gen.flow(X_train*255, y_train, batch_size = 16),steps_per_epoch = len(X_train)/16, validation_data = (X_test,y_test), epochs = 30 )
#Plot Loss and accuracy
import matplotlib.pyplot as plt
plt.figure(figsize = (15,5))
plt.subplot(1,2,1)
plt.plot(History.history['acc'])
plt.plot(History.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.subplot(1,2,2)
plt.plot(History.history['loss'])
plt.plot(History.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
plt.show()
ResNet
在經(jīng)過(guò)試驗(yàn)發(fā)現(xiàn):網(wǎng)絡(luò)層數(shù)的增加可以有效的提升準(zhǔn)確率沒(méi)錯(cuò)召廷,但如果到達(dá)一定的層數(shù)后,訓(xùn)練的準(zhǔn)確率就會(huì)下降了账胧,因此如果網(wǎng)絡(luò)過(guò)深的話竞慢,會(huì)變得更加難以訓(xùn)練。
那么我們作這樣一個(gè)假設(shè):假設(shè)現(xiàn)有一個(gè)比較淺的網(wǎng)絡(luò)(Shallow Net)已達(dá)到了飽和的準(zhǔn)確率找爱,這時(shí)在它后面再加上幾個(gè)恒等映射層(Identity mapping梗顺,也即y=x,輸出等于輸入)车摄,這樣就增加了網(wǎng)絡(luò)的深度寺谤,并且起碼誤差不會(huì)增加,也即更深的網(wǎng)絡(luò)不應(yīng)該帶來(lái)訓(xùn)練集上誤差的上升吮播。而這里提到的使用恒等映射直接將前一層輸出傳到后面的思想变屁,便是著名深度殘差網(wǎng)絡(luò)ResNet的靈感來(lái)源。
ResNet引入了殘差網(wǎng)絡(luò)結(jié)構(gòu)(residual network)意狠,通過(guò)這種殘差網(wǎng)絡(luò)結(jié)構(gòu)粟关,可以把網(wǎng)絡(luò)層弄的很深(據(jù)說(shuō)目前可以達(dá)到1000多層),并且最終的分類效果也非常好环戈,殘差網(wǎng)絡(luò)的基本結(jié)構(gòu)如下圖所示闷板,很明顯,該圖是帶有跳躍結(jié)構(gòu)的:
在上圖的殘差網(wǎng)絡(luò)結(jié)構(gòu)圖中院塞,通過(guò)“shortcut connections(捷徑連接)”的方式遮晚,直接把輸入x傳到輸出作為初始結(jié)果,輸出結(jié)果為H(x)=F(x)+x拦止,當(dāng)F(x)=0時(shí)县遣,那么H(x)=x糜颠,也就是上面所提到的恒等映射。于是萧求,ResNet相當(dāng)于將學(xué)習(xí)目標(biāo)改變了其兴,不再是學(xué)習(xí)一個(gè)完整的輸出,而是目標(biāo)值H(X)和x的差值夸政,也就是所謂的殘差F(x) := H(x)-x元旬,因此,后面的訓(xùn)練目標(biāo)就是要將殘差結(jié)果逼近于0秒梳,使到隨著網(wǎng)絡(luò)加深法绵,準(zhǔn)確率不下降。
這種殘差跳躍式的結(jié)構(gòu)酪碘,打破了傳統(tǒng)的神經(jīng)網(wǎng)絡(luò)n-1層的輸出只能給n層作為輸入的慣例朋譬,使某一層的輸出可以直接跨過(guò)幾層作為后面某一層的輸入,其意義在于為疊加多層網(wǎng)絡(luò)而使得整個(gè)學(xué)習(xí)模型的錯(cuò)誤率不降反升的難題提供了新的方向兴垦。
至此徙赢,神經(jīng)網(wǎng)絡(luò)的層數(shù)可以超越之前的約束,達(dá)到幾十層探越、上百層甚至千層狡赐,為高級(jí)語(yǔ)義特征提取和分類提供了可行性。
下圖是一個(gè)不同架構(gòu)的對(duì)比钦幔,感受下:
基于上表枕屉,我們實(shí)現(xiàn)一個(gè)ResNet:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Tue Jun 18 08:21:36 2019
@author: XFBY
"""
import numpy as np
import keras
from keras.datasets import mnist
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Activation, Conv2D, MaxPooling2D, Flatten,Dropout,BatchNormalization,AveragePooling2D,concatenate,Input, concatenate
from keras.models import Model,load_model
from keras.optimizers import Adam
#Load oxflower17 dataset
import tflearn.datasets.oxflower17 as oxflower17
from sklearn.model_selection import train_test_split
x, y = oxflower17.load_data(one_hot=True)
#Split train and test data
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2,shuffle = True)
#Data augumentation with Keras tools
from keras.preprocessing.image import ImageDataGenerator
img_gen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True
)
#Define convolution with batchnromalization
def Conv2d_BN(x, nb_filter,kernel_size, padding='same',strides=(1,1),name=None):
if name is not None:
bn_name = name + '_bn'
conv_name = name + '_conv'
else:
bn_name = None
conv_name = None
x = Conv2D(nb_filter,kernel_size,padding=padding,strides=strides,activation='relu',name=conv_name)(x)
x = BatchNormalization(axis=3,name=bn_name)(x)
return x
#Define Residual Block for ResNet34(2 convolution layers)
def Residual_Block(input_model,nb_filter,kernel_size,strides=(1,1), with_conv_shortcut =False):
x = Conv2d_BN(input_model,nb_filter=nb_filter,kernel_size=kernel_size,strides=strides,padding='same')
x = Conv2d_BN(x, nb_filter=nb_filter, kernel_size=kernel_size,padding='same')
#need convolution on shortcut for add different channel
if with_conv_shortcut:
shortcut = Conv2d_BN(input_model,nb_filter=nb_filter,strides=strides,kernel_size=kernel_size)
x = add([x,shortcut])
return x
else:
x = add([x,input_model])
return x
#Built ResNet34
def ResNet34(width, height, depth, classes):
Img = Input(shape=(width,height,depth))
x = Conv2d_BN(Img,64,(7,7),strides=(2,2),padding='same')
x = MaxPooling2D(pool_size=(3,3),strides=(2,2),padding='same')(x)
#Residual conv2_x ouput 56x56x64
x = Residual_Block(x,nb_filter=64,kernel_size=(3,3))
x = Residual_Block(x,nb_filter=64,kernel_size=(3,3))
x = Residual_Block(x,nb_filter=64,kernel_size=(3,3))
#Residual conv3_x ouput 28x28x128
x = Residual_Block(x,nb_filter=128,kernel_size=(3,3),strides=(2,2),with_conv_shortcut=True)# need do convolution to add different channel
x = Residual_Block(x,nb_filter=128,kernel_size=(3,3))
x = Residual_Block(x,nb_filter=128,kernel_size=(3,3))
x = Residual_Block(x,nb_filter=128,kernel_size=(3,3))
#Residual conv4_x ouput 14x14x256
x = Residual_Block(x,nb_filter=256,kernel_size=(3,3),strides=(2,2),with_conv_shortcut=True)# need do convolution to add different channel
x = Residual_Block(x,nb_filter=256,kernel_size=(3,3))
x = Residual_Block(x,nb_filter=256,kernel_size=(3,3))
x = Residual_Block(x,nb_filter=256,kernel_size=(3,3))
x = Residual_Block(x,nb_filter=256,kernel_size=(3,3))
x = Residual_Block(x,nb_filter=256,kernel_size=(3,3))
#Residual conv5_x ouput 7x7x512
x = Residual_Block(x,nb_filter=512,kernel_size=(3,3),strides=(2,2),with_conv_shortcut=True)
x = Residual_Block(x,nb_filter=512,kernel_size=(3,3))
x = Residual_Block(x,nb_filter=512,kernel_size=(3,3))
#Using AveragePooling replace flatten
x = GlobalAveragePooling2D()(x)
x = Dense(classes,activation='softmax')(x)
model=Model(input=Img,output=x)
return model
#Plot Loss and accuracy
import matplotlib.pyplot as plt
plt.figure(figsize = (15,5))
plt.subplot(1,2,1)
plt.plot(History.history['acc'])
plt.plot(History.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.subplot(1,2,2)
plt.plot(History.history['loss'])
plt.plot(History.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
plt.show()