fast rcnn的網(wǎng)絡(luò)結(jié)構(gòu):stage1_fast_rcnn_train.pt
首先來看數(shù)據(jù)的準(zhǔn)備階段:
name: "ZF"
layer {
name: 'data'
type: 'Python'
top: 'data'
top: 'rois'?
top: 'labels'?
top: 'bbox_targets'?
top: 'bbox_inside_weights'
top: 'bbox_outside_weights'
python_param {
module: 'roi_data_layer.layer'
layer: 'RoIDataLayer'
param_str: "'num_classes': 21"
}
}
進入roi_data_layer.layer文件查看forward函數(shù):
其實争占,這個函數(shù)我們在train_rpn的時候已經(jīng)用到過一次了,現(xiàn)在只不過參數(shù)設(shè)置不同屈暗,里面的細節(jié)有一些變化:
首先蜈七,還是獲取blobs:blobs =self._get_next_minibatch()艇棕,
下面進入_get_next_minibatch函數(shù):
首先從圖片數(shù)據(jù)中隨機抽取兩張圖片數(shù)據(jù):db_inds =self._get_next_minibatch_inds(),在_get_next_minibatch_inds函數(shù)中,由于參數(shù)cfg.TRAIN.IMS_PER_BATCH=2咆霜,所以每次是抽取了兩張圖片穷绵,也就是說我們每次處理的是兩張圖片(和train_rpn的時候不同轿塔,train_rpn抽取的是一張圖片)。
回到_get_next_minibatch函數(shù)中仲墨,下一步獲取圖片數(shù)據(jù):minibatch_db = [self._roidb[i] for i in db_inds]勾缭,這是一個列表,最后利用get_minibatch函數(shù)返回結(jié)果:return get_minibatch(minibatch_db,self._num_classes)
進入get_minibatch函數(shù):
首先目养,將圖片縮放并獲取縮放比例:im_scales俩由,然后將列表數(shù)據(jù)minibatch_db 轉(zhuǎn)化成caffe需要的blob格式:
im_blob, im_scales = _get_image_blob(roidb, random_scale_inds),這里的im_blob的batch=2癌蚁。
然后幻梯,把im_blob添加到blobs字典:blobs = {'data': im_blob}
因為參數(shù)cfg.TRAIN.HAS_RPN=False兜畸,因此,這里執(zhí)行else語句礼旅,首先初始化一些空的變量膳叨,用于向blobs 字典中添加數(shù)據(jù):
rois_blob = np.zeros((0,5),dtype=np.float32)
labels_blob = np.zeros((0),dtype=np.float32)
bbox_targets_blob = np.zeros((0,4 * num_classes),dtype=np.float32)
bbox_inside_blob = np.zeros(bbox_targets_blob.shape,dtype=np.float32)
然后,循環(huán)圖片列表(其實只有兩張圖片)并求得labels, overlaps, im_rois, bbox_targets, bbox_inside_weights等變量:
for im_i in range(num_images):
? ? ? ? ? ?labels, overlaps, im_rois, bbox_targets, bbox_inside_weights \
? ? ? ? ? ? = _sample_rois(roidb[im_i], fg_rois_per_image, rois_per_image,?num_classes)
下面進入_sample_rois函數(shù):
先看一下輸入數(shù)據(jù):
roidb[im_i]:第im_i張圖片的roidb數(shù)據(jù)
rois_per_image:默認值為64
fg_rois_per_image:默認值為16
num_classes:21
再來說一下_sample_rois函數(shù)的作用:主要就是將我們得到的proposals(<=2000個)限制在64個痘系,其中前景proposals的個數(shù)<=16個(根據(jù)cfg.TRAIN.FG_THRESH菲嘴,多的話隨機抽取)汰翠,背景proposals的個數(shù)大于等于48個龄坪,小于等于64個(多的話隨機抽取)复唤。當(dāng)然健田,在將proposals的個數(shù)限制在64個之后,也將這些保留下來的proposals改名為了:rois佛纫。
好了妓局,來一下函數(shù)的返回結(jié)果是什么:
labels = roidb['max_classes']
overlaps = roidb['max_overlaps']
rois = roidb['boxes']?
如果保留的rois的索引號為:keep_inds,那么返回的結(jié)果為:
labels = labels[keep_inds] ?# 前景的labels設(shè)置為:對應(yīng)的物體類別號
labels[fg_rois_per_this_image:] =0 ?#?將背景的labels設(shè)置為:0
overlaps = overlaps[keep_inds]?
rois = rois[keep_inds]?
最后呈宇,還有兩個需要輸入的結(jié)果:
bbox_targets, bbox_inside_weights = _get_bbox_regression_labels(
roidb['bbox_targets'][keep_inds, :], num_classes)
其實是由_get_bbox_regression_labels函數(shù)的返回值好爬。
下面來看一下_get_bbox_regression_labels函數(shù):
輸入是roidb中元素字典的bbox_targets:roidb['bbox_targets'][keep_inds, :],當(dāng)然這里的bbox_targets也和rois一樣甥啄,只保留對應(yīng)于keep_inds索引的值存炮,還有一個輸入是num_classes=21
再開說一下_get_bbox_regression_labels函數(shù)的作用:其實就是把roidb['bbox_targets'][keep_inds, :]矩陣,由原來的len(keep_inds)行5列蜈漓,轉(zhuǎn)變成了len(keep_inds)行84列穆桂,而且返回的矩陣bbox_targets在每一行中,只有對應(yīng)的物體號的那4列的值為非0元素(這4列的取值融虽,其實就是原來的roidb['bbox_targets'][keep_inds, :]矩陣后4列的值)享完,其余80列的值都為0,當(dāng)然有额,如果某一行對應(yīng)的是背景般又,那么一整行的元素取值都為0。
在_get_bbox_regression_labels函數(shù)中谆吴,還有一個返回值bbox_inside_weights倒源,這也是一個len(keep_inds)行84列的矩陣,和bbox_targets是對應(yīng)的句狼,只不過在bbox_targets的取值為非0的取值不同笋熬,默認取值為:(1.0, 1.0, 1.0, 1.0)。
下面返回get_minibatch函數(shù)腻菇,根據(jù)_sample_rois函數(shù)胳螟,得到了一些列的返回值:labels, ? overlaps, ? im_rois, bbox_targets 和 bbox_inside_weights昔馋。接下來在循環(huán)中對這些返回值進行操作:
rois = _project_im_rois(im_rois, im_scales[im_i]) ?:把im_rois中的坐標(biāo)對應(yīng)到縮放之后的圖片上,因為我們的處理都是在縮放之后的圖片上進行的糖耸,而im_rois的坐標(biāo)是相對于原圖來說的秘遏。
然后,給rois在最前面增加1列數(shù)據(jù):
batch_ind = im_i * np.ones((rois.shape[0],1))?
rois_blob_this_image = np.hstack((batch_ind, rois))?
從這里可以看出嘉竟,rois_blob_this_image 是一個rois.shape[0]行5列的矩陣邦危,如果第1列的元素為0,那么代表的是這個batch中的第1張圖片舍扰,如果第1列的元素為1倦蚪,代表的是這個batch中的第2張圖片。(其實边苹,關(guān)于這里的np.hstack有一個隱患陵且,就是兩張圖片的rois.shape[0]有可能不相等个束,這樣的話就會報錯慕购。當(dāng)然茬底,除非是極端情況,要不然不可能發(fā)生桩警。因為我們得到的proposals的數(shù)量夠多可训,2000個捶枢,而rois_per_image又比較小飞崖,只有64烂叔,所以,每張圖片rois.shape[0]最終的取值大概率都會是:64)
然后把結(jié)果rois_blob_this_image?合并到rois_blob 中:
rois_blob = np.vstack((rois_blob, rois_blob_this_image))固歪,結(jié)果兩次循環(huán)之后蒜鸡,rois_blob 里面就包含了兩張圖片的數(shù)據(jù)逢防。
同樣的蒲讯,對其他變量labels_blob、bbox_targets_blob判帮、bbox_inside_blob也進行數(shù)據(jù)的合并溉箕,即:把兩張圖片的數(shù)據(jù)合并在一起:
labels_blob = np.hstack((labels_blob, labels))
bbox_targets_blob = np.vstack((bbox_targets_blob, bbox_targets))
bbox_inside_blob = np.vstack((bbox_inside_blob, bbox_inside_weights))
最后悦昵,把得到的上述數(shù)據(jù)添加到blobs字典中:
blobs['rois'] = rois_blob?
blobs['labels'] = labels_blob?
blobs['bbox_targets'] = bbox_targets_blob?
blobs['bbox_inside_weights'] = bbox_inside_blob?
blobs['bbox_outside_weights'] =?np.array(bbox_inside_blob >0).astype(np.float32) ?# 由0和1組成的矩陣但指,其中bbox_inside_blob中不為0的地方對應(yīng)的位置?取值為1,其余地方取值為0
最后氓癌,get_minibatch函數(shù)返回字典blobs贫橙。
然后,回到_get_next_minibatch函數(shù)和forward函數(shù)疲迂,我們得到blobs 字典:blobs =self._get_next_minibatch()莫湘。
接下來把blobs字典中的取值取出并傳遞給top返回,top也即是forward函數(shù)的輸出結(jié)果腰池。
OK忙芒,這樣我們就完成了數(shù)據(jù)的準(zhǔn)備工作,接下來奏属,把得到的上述數(shù)據(jù)輸入接下來的網(wǎng)絡(luò)進行傳播潮峦。
最開始,是5個卷積層忱嘹,已經(jīng)見過多次了:
#========= conv1-conv5 ============
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
convolution_param {
num_output: 96
kernel_size: 7
pad: 3
stride: 2
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "norm1"
type: "LRN"
bottom: "conv1"
top: "norm1"
lrn_param {
local_size: 3
alpha: 0.00005
beta: 0.75
norm_region: WITHIN_CHANNEL
engine: CAFFE
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "norm1"
top: "pool1"
pooling_param {
kernel_size: 3
stride: 2
pad: 1
pool: MAX
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
convolution_param {
num_output: 256
kernel_size: 5
pad: 2
stride: 2
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "norm2"
type: "LRN"
bottom: "conv2"
top: "norm2"
lrn_param {
local_size: 3
alpha: 0.00005
beta: 0.75
norm_region: WITHIN_CHANNEL
engine: CAFFE
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "norm2"
top: "pool2"
pooling_param {
kernel_size: 3
stride: 2
pad: 1
pool: MAX
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
convolution_param {
num_output: 384
kernel_size: 3
pad: 1
stride: 1
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
convolution_param {
num_output: 384
kernel_size: 3
pad: 1
stride: 1
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
convolution_param {
num_output: 256
kernel_size: 3
pad: 1
stride: 1
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
接下來是ROI池化層:
roi層的代碼是在fast rcnn中的cpp文件中。
roi層之后,在接兩個全連接層:
layer {
name: "fc6"
type: "InnerProduct"
bottom: "roi_pool_conv5"
top: "fc6"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
inner_product_param {
num_output: 4096
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
scale_train: false
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6"
top: "fc7"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
inner_product_param {
num_output: 4096
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
scale_train: false
}
}
緊接著fc7分兩個方向慰技,一個方向預(yù)測:物體類別吻商,另一個方向預(yù)測:box的坐標(biāo)糟红。
layer {
name: "cls_score"
type: "InnerProduct"
bottom: "fc7"
top: "cls_score"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
inner_product_param {
num_output: 21
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
這個是用了一個全連接層來預(yù)測類別,輸出結(jié)果為:cls_score柒爸。
layer {
name: "bbox_pred"
type: "InnerProduct"
bottom: "fc7"
top: "bbox_pred"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
inner_product_param {
num_output: 84
weight_filler {
type: "gaussian"
std: 0.001
}
bias_filler {
type: "constant"
value: 0
}
}
}
同樣是用來一個全連接層來預(yù)測box的坐標(biāo)事扭,輸出結(jié)果為:bbox_pred。
最后今野,根據(jù)?cls_score?和?bbox_pred?來計算loss:
layer {
name: "loss_cls"
type: "SoftmaxWithLoss"
bottom: "cls_score"
bottom: "labels"
propagate_down: 1
propagate_down: 0
top: "cls_loss"
loss_weight: 1
loss_param {
ignore_label: -1
normalize: true
}
}
layer {
name: "loss_bbox"
type: "SmoothL1Loss"
bottom: "bbox_pred"
bottom: "bbox_targets"
bottom: "bbox_inside_weights"
bottom: "bbox_outside_weights"
top: "bbox_loss"
loss_weight: 1
}
最后罐农,回到train_fast_rcnn函數(shù):
得到了訓(xùn)練的fast rcnn的網(wǎng)絡(luò)涵亏,保存在model_paths列表中,然后气筋,移除model_paths列表中保存的網(wǎng)絡(luò)文件裆悄,只保留最新的網(wǎng)絡(luò):
for iin model_paths[:-1]:
os.remove(i)
把列表中剩余的唯一一個元素保存在fast_rcnn_model_path變量中:fast_rcnn_model_path = model_paths[-1]
把fast_rcnn_model_path 以字典的形式推入的子進程的隊列中:
queue.put({'model_path': fast_rcnn_model_path})
到這里臂聋,創(chuàng)建子進程結(jié)束:p = mp.Process(target=train_fast_rcnn,kwargs=mp_kwargs)
下面,啟動進程:p.start()
從子進程隊列中獲取訓(xùn)練得到的fast rcnn的網(wǎng)絡(luò):fast_rcnn_stage1_out = mp_queue.get()
等待子進程結(jié)束:p.join()