本文的部分內(nèi)容借鑒https://zhuanlan.zhihu.com/p/21586417
首先看看Resnet最常見的一張圖:
當(dāng)網(wǎng)絡(luò)結(jié)構(gòu)越來越深時, 想必淺層結(jié)構(gòu), 網(wǎng)絡(luò)越來越難被訓(xùn)練.
如今有很多常用的方法, 比如 BatchNormalization, Dropout等手段, 以前關(guān)于BN的文章可以看出, 不加BN時網(wǎng)絡(luò)可能直接發(fā)散了.
DL的原則是網(wǎng)絡(luò)的深度越深越好, 深度代表著一種熵, 也就是網(wǎng)絡(luò)的深度代表著網(wǎng)絡(luò)對特征的抽象化程度, 抽象程度越高的越可能包含有語義級的含義. 但是如何解決難以訓(xùn)練的問題呢?
- 該怎么解決呢?
如果加入的神經(jīng)元是線性的, 也就是x = x, 網(wǎng)絡(luò)結(jié)構(gòu)的實(shí)際深度實(shí)際上沒有變化.
對于DL的一層來說, 正常的映射應(yīng)該是 x -> f(x) 如果這時按照上圖則應(yīng)該有x -> h(x)+x, 如果想要二者相等, h(x)+x=f(x), 也就是h(x) = f(x)-x,這就是"殘差"概念的由來. 當(dāng)h(x) = 0時, 網(wǎng)絡(luò)等價于x->x,與此同時x=f(x),一方面網(wǎng)絡(luò)基本等于線性,可以擴(kuò)展到很深,另一方面本來希望得到的非線性映射也傳播了下去.
- 還有一種說法是低層的特征與高層的特征進(jìn)行了融合,從而獲得了更好的效果,這種說法也有一定的道理.
- 當(dāng)然,后來還有一篇論文證明resnet的深度沒有實(shí)質(zhì)的加深,這篇論文我還沒有看過,等看完以后我還會來更新這篇博客.
- resnet包含兩種重要的基本結(jié)構(gòu), 從名字上看就知道一個是帶卷積的,一個是不帶卷積的. 完整版網(wǎng)絡(luò)結(jié)構(gòu)在: http://link.zhihu.com/?target=http%3A//ethereon.github.io/netscope/%23/gist/db945b393d40bfa26006, 熟悉了以下這兩個基本結(jié)構(gòu),繼續(xù)往下搭積木就可以.
conv_block
identity_block
- conv_block
程序沒有管每個節(jié)點(diǎn)的命名, 主路1,3,1結(jié)構(gòu),側(cè)路1結(jié)構(gòu)
def conv_block(input_tensor, filters):
filter1, filter2, filter3 = filters
x = Conv2D(filter1,(1,1),strides=1)(input_tensor)
x = BatchNormalization(axis=-1)(x)
x = Activation('relu')(x)
x = Conv2D(filter2,(3,3),strides=1,padding='same')(x)
x = BatchNormalization(axis=-1)(x)
x = Activation('relu')(x)
x = Conv2D(filter3,(1,1),strides=1)(x)
x = BatchNormalization(axis=-1)(x)
x = Activation('relu')(x)
y = Conv2D(filter3,(1,1),strides=1)(input_tensor)
y = BatchNormalization(axis=-1)(y)
y = Activation('relu')(y)
out = merge([x,y],mode='sum')
z = Activation('relu')(out)
return z
- identity_block 不同的地方是側(cè)路沒有卷積
def identity_block(input_tensor, filters):
filter1, filter2, filter3 = filters
x = Conv2D(filter1,(1,1),strides=1)(input_tensor)
x = BatchNormalization(axis=-1)(x)
x = Activation('relu')(x)
x = Conv2D(filter2,(3,3),strides=1,padding='same')(x)
x = BatchNormalization(axis=-1)(x)
x = Activation('relu')(x)
x = Conv2D(filter3,(1,1),strides=1)(x)
x = BatchNormalization(axis=-1)(x)
x = Activation('relu')(x)
y = Conv2D(filter3,(1,1),strides=1)(input_tensor)
y = BatchNormalization(axis=-1)(y)
y = Activation('relu')(y)
out = merge([x,input_tensor],mode='sum')
z = Activation('relu')(out)
return z
網(wǎng)絡(luò)的整體結(jié)構(gòu)為:
data 1,3,224,224
conv filter=64, kernel_size=7, pad=3,stride=2 1,64,112,112
bn
activation('relu')
maxpool kernel_size=3,stride=2 1,64,56,56
# block 1 (64,64,256)
conv_block() in:1,64,56,56 filter=(64,64,256),out=1,256,56,56
identity_block in=1,256,56,56, filter=(64,64,256),out=1,256,56,56
identity_block in=1,256,56,56, filter=(64,64,256),out=1,256,56,56
# block 2 (128,128,512)
conv_block in=1,256,56,56 filter=(128,128,512),out=1,512,28,28
identity_block in=1,256,56,56 filter=(128,128,512),out=1,512,28,28
identity_block in=1,256,56,56 filter=(128,128,512),out=1,512,28,28
identity_block in=1,256,56,56 filter=(128,128,512),out=1,512,28,28
# block 3 (256,256,1024)
conv_block in=1,512,28,28 filter=(256,256,1024),out=1,1024,14,14
identity_block in=1,512,28,28 filter=(256,256,1024),out=1,1024,14,14
identity_block in=1,512,28,28 filter=(256,256,1024),out=1,1024,14,14
identity_block in=1,512,28,28 filter=(256,256,1024),out=1,1024,14,14
identity_block in=1,512,28,28 filter=(256,256,1024),out=1,1024,14,14
identity_block in=1,512,28,28 filter=(256,256,1024),out=1,1024,14,14
# block 4 (512,512,2048)
conv_block in=1,1024,14,14 filter=(512,512,2048),out=1,2048,7,7
identity_block in=1,1024,14,14 filter=(512,512,2048),out=1,2048,7,7
identity_block in=1,1024,14,14 filter=(512,512,2048),out=1,2048,7,7
maxpool kernel_size=7, stride=1 out=1,2048,1,1
flatten
dence(1,1000)
acivation('softmax')
probbility(1,1000)
主函數(shù)
# coding:utf-8
import keras
from resnet_model import resnet_model
from keras.datasets import cifar10
from keras.utils import plot_model
from keras.callbacks import TensorBoard, ModelCheckpoint, LearningRateScheduler
import math
if __name__ == '__main__':
n_class = 10
img_w = 32
img_h = 32
BATCH_SIZE = 128
EPOCH = 100
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = x_train.astype('float32')
x_train /= 255.
y_train = keras.utils.np_utils.to_categorical(y_train, n_class)
x_test = x_test.astype('float32')
x_test /= 255.
y_test = keras.utils.np_utils.to_categorical(y_test, n_class)
tb = TensorBoard(log_dir='log')
cp = ModelCheckpoint(filepath='best_model.h5', monitor='val_loss',save_best_only=1, mode='auto')
def step_decay(epoch):
initial_lrate = 0.01
drop = 0.5
epochs_drop = 10.0
lrate = initial_lrate * math.pow(drop, math.floor((1 + epoch) / epochs_drop))
return lrate
lr = LearningRateScheduler(step_decay)
CB = [tb, cp, lr]
input_shape = [x_train.shape[1], x_train.shape[2], x_train.shape[3]]
model = resnet_model(out_class=n_class, input_shape = input_shape)
plot_model(model, show_layer_names=1)
model.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=BATCH_SIZE, epochs=EPOCH, validation_split=0.3,
callbacks=CB, shuffle=1)
loss, acc = model.evaluate(x_test, y_test, batch_size= BATCH_SIZE)
模型函數(shù)
# coding: utf-8
from keras.models import Model
from keras.layers import Input,Conv2D,BatchNormalization,Activation,MaxPool2D,merge,Flatten,Dense
import math
# from identity_block import identity_block
# from conv_block import conv_block
# from keras.layers import Conv2D,BatchNormalization,Activation
def conv_block(input_tensor, filters):
filter1, filter2, filter3 = filters
x = Conv2D(filter1,(1,1),strides=1)(input_tensor)
x = BatchNormalization(axis=-1)(x)
x = Activation('relu')(x)
x = Conv2D(filter2,(3,3),strides=1,padding='same')(x)
x = BatchNormalization(axis=-1)(x)
x = Activation('relu')(x)
x = Conv2D(filter3,(1,1),strides=1)(x)
x = BatchNormalization(axis=-1)(x)
x = Activation('relu')(x)
y = Conv2D(filter3,(1,1),strides=1)(input_tensor)
y = BatchNormalization(axis=-1)(y)
y = Activation('relu')(y)
out = merge([x,y],mode='sum')
z = Activation('relu')(out)
return z
def identity_block(input_tensor, filters):
filter1, filter2, filter3 = filters
x = Conv2D(filter1,(1,1),strides=1)(input_tensor)
x = BatchNormalization(axis=-1)(x)
x = Activation('relu')(x)
x = Conv2D(filter2,(3,3),strides=1,padding='same')(x)
x = BatchNormalization(axis=-1)(x)
x = Activation('relu')(x)
x = Conv2D(filter3,(1,1),strides=1)(x)
x = BatchNormalization(axis=-1)(x)
x = Activation('relu')(x)
y = Conv2D(filter3,(1,1),strides=1)(input_tensor)
y = BatchNormalization(axis=-1)(y)
y = Activation('relu')(y)
out = merge([x,input_tensor],mode='sum')
z = Activation('relu')(out)
return z
def resnet_model(out_class, input_shape):
inputs = Input(shape=input_shape) #1,3,224,224
#
x = Conv2D(64, (7, 7), strides=2, padding='same')(inputs) #conv1 1,64,112,112
x = BatchNormalization(axis=-1)(x) #bn_conv1
x = Activation('relu')(x) #conv1_relu
x = MaxPool2D(pool_size=(3,3),strides=2)(x) # 1,64,56,56
# block1 (64,64,256) 1,2 in:1,64,56,56
x = conv_block(x, [64, 64, 256]) #out=1,256,56,56
x = identity_block(x, [64, 64, 256]) #out=1,256,56,56
x = identity_block(x, [64, 64, 256]) #out=1,256,56,56
# block2 (128,128,512) 1,3 in:1,256,56,56
x = conv_block(x, [128,128,512]) #out=1,512,28,28
x = identity_block(x, [128,128,512]) #out=1,512,28,28
x = identity_block(x, [128,128,512]) #out=1,512,28,28
x = identity_block(x, [128, 128, 512]) # out=1,512,28,28
# block 3 (256,256,1024) 1,5 in:1,512,28,28
x = conv_block(x, [256,256,1024]) # out=1,1024,14,14
x = identity_block(x, [256, 256, 1024]) # out=1,1024,14,14
x = identity_block(x, [256, 256, 1024]) # out=1,1024,14,14
x = identity_block(x, [256, 256, 1024]) # out=1,1024,14,14
x = identity_block(x, [256, 256, 1024]) # out=1,1024,14,14
x = identity_block(x, [256, 256, 1024]) # out=1,1024,14,14
# block 4 (512,512,2048) 1,2 in:1,1024,14,14
x = conv_block(x, [512,512,2048]) # out=1,2048,7,7
x = identity_block(x, [512, 512, 2048]) # out=1,2048,7,7
x = identity_block(x, [512, 512, 2048]) # out=1,2048,7,7
# maxpool kernel_size=7, stride=1 out=1,2048,1,1
x = MaxPool2D(pool_size=(7, 7), strides=1)(x)
# flatten
x = Flatten()(x)
# # Dense
# x = Dense(1000)(x) # out=1,1000
# Dense,這里改造了一下击蹲,適應(yīng)cifar10
x = Dense(out_class)(x) # out=1,1000
out = Activation('softmax')(x)
model = Model(inputs=inputs, outputs=out)
return model
現(xiàn)在正在跑, 1060的卡還是太局限了, 建議有經(jīng)濟(jì)能力的同學(xué)直接上1080ti,
- epoch=300,每輪166秒,一共用時13.8小時計算完成
訓(xùn)練集效果還可以,99.75%,實(shí)際上由于關(guān)于cifar10的訓(xùn)練進(jìn)行的次數(shù)不多,之前用vgg16達(dá)到過1.000, 很難說這個比率是不是真的高,損失0.0082
測試集74.39%,顯而易見出現(xiàn)了過擬合的現(xiàn)象,loss的波動也非常大,
考慮解決方案, 加入dropout(0.5)嘗試,加入學(xué)習(xí)率衰減,是否因?yàn)槟P瓦^于復(fù)雜,因?yàn)閞esnet在Imagenet上的表現(xiàn)最好,Imagenet的圖像容量要遠(yuǎn)大于cifar10