1、output_alt_opt
faster_rcnn_alt_opt.sh
train_faster_rcnn_alt_opt.py
Stage 1 RPN, init from ImageNet model
RPN訓(xùn)練過程:
train_rpn中:
cfg.TRAIN.PROPOSAL_METHOD = 'gt'
模式設(shè)定,之后會調(diào)用pascal_voc.py中g(shù)t_roidb
cfg.TRAIN.IMS_PER_BATCH = 1
get_roidb準(zhǔn)備roidb, imdb
train_net訓(xùn)練RPN
get_roidb中:
imdb = get_imdb(imdb_name)
初始化imdb類宛官,調(diào)用factory.py和pascal.py
訓(xùn)練RPN時(shí)rpn_file=None餐曼,只有g(shù)round truth的框
roidb = get_training_roidb(imdb)
調(diào)用train.py中g(shù)et_training_roidb函數(shù)邪蛔,得到roidb
get_training_roidb中:
imdb.append_flipped_images()
(imdb.py中)水平翻轉(zhuǎn)可以看作一種數(shù)據(jù)擴(kuò)充方式王暗,將gt_roidb函數(shù)(pascal_voc.py)中得到的roidb[i]['boxes']翻轉(zhuǎn),圖像索引加倍
rdl_roidb.prepare_roidb(imdb)
(roidb.py中)得到roidb[i]['max_classes']惠奸,roidb[i]['max_overlaps'],roidb[i]['image']恰梢,roidb[i]['width']佛南,roidb[i]['height']
gt_roidb中:
解析標(biāo)注的xml文件(ground truth)
/data/VOCdevkit2007/VOC2007/Annotations
得到gt_roidb
gt_roidb包括:
{'boxes' : boxes,
'gt_classes': gt_classes,
'gt_overlaps' : overlaps,
'flipped' : False,
'seg_areas' : seg_areas}
gt_roidb中維度如下(函數(shù)_load_pascal_annotation)
boxes = np.zeros((num_objs, 4), dtype=np.uint16)
gt_classes = np.zeros((num_objs), dtype=np.int32)
overlaps = np.zeros((num_objs, self.num_classes), dtype=np.float32)
# "Seg" area for pascal is just the box area
seg_areas = np.zeros((num_objs), dtype=np.float32)
# Load object bounding boxes into a data frame.
for ix, obj in enumerate(objs):
bbox = obj.find('bndbox')
# Make pixel indexes 0-based
x1 = float(bbox.find('xmin').text) - 1
y1 = float(bbox.find('ymin').text) - 1
x2 = float(bbox.find('xmax').text) - 1
y2 = float(bbox.find('ymax').text) - 1
cls = self._class_to_ind[obj.find('name').text.lower().strip()]
boxes[ix, :] = [x1, y1, x2, y2]
gt_classes[ix] = cls
overlaps[ix, cls] = 1.0
seg_areas[ix] = (x2 - x1 + 1) * (y2 - y1 + 1)
train_rpn中g(shù)et_roidb之后調(diào)用train_net訓(xùn)練:
model_paths = train_net(solver, roidb, output_dir,pretrained_model=init_model,max_iters=max_iters)
tran.py中train_net:
roidb = filter_roidb(roidb)
圖片不滿足至少有一個(gè)前景或至少有一個(gè)背景的條件,overlaps:0-0.5背景0.5-1前景嵌言,RPN訓(xùn)練時(shí)都是ground-truth嗅回,overlaps都是1,所以該函數(shù)無用
sw = SolverWrapper(solver_prototxt, roidb, output_dir,pretrained_model=pretrained_model)
加載預(yù)訓(xùn)練模型和solver_prototxt等
stage1_rpn_train.pt
name: "ZF"
layer {
name: 'input-data'
type: 'Python'
top: 'data'
top: 'im_info'
top: 'gt_boxes'
python_param {
module: 'roi_data_layer.layer'
layer: 'RoIDataLayer'
param_str: "'num_classes': 21"
}
}
調(diào)用roi_data_layer.layer
層摧茴,該層就是調(diào)用程序minibatch.py中g(shù)et_minibatch函數(shù)绵载。
get_minibatch
rois_per_image:
每個(gè)圖像最多包含的boxes個(gè)數(shù),這里取128/2=64 //use RPN這個(gè)參數(shù)沒有
fg_rois_per_image:
rois_per_image的0.25,前景16個(gè) //use RPN這個(gè)參數(shù)沒有
im_blob, im_scales = _get_image_blob(roidb, random_scale_inds)
得到縮放系數(shù)娃豹,原圖和網(wǎng)絡(luò)輸入圖像的比例焚虱,im_blob網(wǎng)絡(luò)的輸入blob
if cfg.TRAIN.HAS_RPN:
訓(xùn)練RPN的時(shí)候
gt_inds = np.where(roidb[0]['gt_classes'] != 0)[0]
正樣本
gt_boxes = np.empty((len(gt_inds), 5), dtype=np.float32)
gt_boxes[:, 0:4] = roidb[0]['boxes'][gt_inds, :] * im_scales[0]
gt_boxes[:, 4] = roidb[0]['gt_classes'][gt_inds]
blobs['gt_boxes'] = gt_boxes
5列,前4列坐標(biāo)懂版,最后一列是類別鹃栽,行數(shù)是正樣本的數(shù)
blobs['im_info'] = np.array([[im_blob.shape[2], im_blob.shape[3], im_scales[0]]],dtype=np.float32)
im_scale = float(target_size) / float(im_size_min) = 600/min(P,Q);(im_blob.shape[2], im_blob.shape[3]) = (M,N)躯畴;min(M,N) = 600
函數(shù)返回blob
layer {
name: 'rpn-data'
type: 'Python'
bottom: 'rpn_cls_score'
bottom: 'gt_boxes'
bottom: 'im_info'
bottom: 'data'
top: 'rpn_labels'
top: 'rpn_bbox_targets'
top: 'rpn_bbox_inside_weights'
top: 'rpn_bbox_outside_weights'
python_param {
module: 'rpn.anchor_target_layer'
layer: 'AnchorTargetLayer'
param_str: "'feat_stride': 16"
}
}
generate_anchors(base_size=16, ratios=[0.5, 1, 2], scales=2**np.arange(3, 6)):
在網(wǎng)絡(luò)開始就得到了9個(gè)anchor的大小定義民鼓,這9個(gè)是feature_map第一個(gè)cell的anchor
def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
scales=2**np.arange(3, 6)):
"""
Generate anchor (reference) windows by enumerating aspect ratios X
scales wrt a reference (0, 0, 15, 15) window.
"""
base_anchor = np.array([1, 1, base_size, base_size]) - 1 # [0, 0, 15, 15]
ratio_anchors = _ratio_enum(base_anchor, ratios)
'''[[ -3.5, 2. , 18.5, 13. ],
[ 0. , 0. , 15. , 15. ],
[ 2.5, -3. , 12.5, 18. ]]'''
anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
for i in xrange(ratio_anchors.shape[0])])
'''
[[ -84. -40. 99. 55.]
[-176. -88. 191. 103.]
[-360. -184. 375. 199.]
[ -56. -56. 71. 71.]
[-120. -120. 135. 135.]
[-248. -248. 263. 263.]
[ -36. -80. 51. 95.]
[ -80. -168. 95. 183.]
[-168. -344. 183. 359.]]
'''
return anchors
anchor_target_layer.py:
生成每個(gè)錨點(diǎn)的訓(xùn)練目標(biāo)和標(biāo)簽,將其分類為1 (object)蓬抄,0 (not object) 丰嘉, -1 (ignore)。當(dāng)label>0倡鲸,也就是有object時(shí)供嚎,將會進(jìn)行box的回歸。
forward函數(shù):在每一個(gè)cell中峭状,生成9個(gè)錨點(diǎn)克滴,提供這9個(gè)錨點(diǎn)的細(xì)節(jié)信息,過濾掉超過圖像的錨點(diǎn)优床,測量同GT的overlap劝赔。
1、產(chǎn)生proposal胆敞,A個(gè)anchors着帽,K個(gè)shifts,這里A=9
移层,K=H*W
仍翰,W、H代表featuremap的長寬观话,一張圖中均勻地取了61 x 36個(gè)點(diǎn)予借,shift_x和shift_y分別是這些點(diǎn)在圖中的偏移位置,通過對九個(gè)anchor坐標(biāo)偏移可以使feature_map的每個(gè)cell都有9個(gè)anchor频蛔。H x feat_stride
以及W x feat_stride
正好約等于rescale以后的每張圖的大小灵迫,feat_stride=16
2、去除超過圖像邊界的anchors(裁減掉了2/3左右)
3晦溪、labels全-1
4瀑粥、anchors和gt_boxes的overlaps
5、labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0
==>0.3
labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1
==>0.7
6三圆、對正樣本和負(fù)樣本采樣(正樣本與負(fù)樣本保持1:1)
num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE)
num_fg不大于0.5*256
num_bg = cfg.TRAIN.RPN_BATCHSIZE - np.sum(labels == 1)
負(fù)樣本數(shù)目
7狞换、得到bbox_targets((len(inds_inside), 4):負(fù)樣本是全是0避咆、bbox_inside_weights((len(inds_inside), 4)z正樣本四個(gè)數(shù)都賦值為cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS=1、bbox_outside_weights((len(inds_inside), 4)如下:
if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0:#(-1)
# uniform weighting of examples (given non-uniform sampling)
num_examples = np.sum(labels >= 0)
positive_weights = np.ones((1, 4)) * 1.0 / num_examples
negative_weights = np.ones((1, 4)) * 1.0 / num_examples
bbox_outside_weights[labels == 1, :] = positive_weights
bbox_outside_weights[labels == 0, :] = negative_weights
8哀澈、
_unmap:
all_anchors裁減掉了2/3左右牌借,僅僅保留在圖像內(nèi)的anchor,這里就是將其復(fù)原作為下一層的輸入了割按,并reshape成相應(yīng)的格式
Stage 1 RPN, generate proposals
rpn_generate中:
cfg.TEST.RPN_PRE_NMS_TOP_N = -1 # no pre NMS filtering
cfg.TEST.RPN_POST_NMS_TOP_N = 2000
最后得到的proposal不超過2000
rpn_net = caffe.Net(rpn_test_prototxt, rpn_model_path, caffe.TEST)
使用上面rpn訓(xùn)練的模型膨报,prototxt:rpn_test.pt進(jìn)行測試得到proposal
rpn_test.pt中:
layer {
name: 'proposal'
type: 'Python'
bottom: 'rpn_cls_prob_reshape'
bottom: 'rpn_bbox_pred'
bottom: 'im_info'
top: 'rois'
top: 'scores'
python_param {
module: 'rpn.proposal_layer'
layer: 'ProposalLayer'
param_str: "'feat_stride': 16"
}
}
rpn.proposal_layer
-->proposal_layer.py:這個(gè)函數(shù)是用來將RPN的輸出轉(zhuǎn)變?yōu)閛bject proposals的。作者新增了ProposalLayer類适荣,這個(gè)類中现柠,重新了set_up和forward函數(shù)
forward:
生成錨點(diǎn)box、對于每個(gè)錨點(diǎn)提供box的參數(shù)細(xì)節(jié)
將預(yù)測框切成圖像弛矛,刪除寬够吩、高小于閾值(16 * im_info[2])的框
將所有的(proposal, score) 對排序
獲取 pre_nms_topN proposals(這里不執(zhí)行,因?yàn)閏fg.TEST.RPN_PRE_NMS_TOP_N = -1)
獲取NMS(閾值0.7)
獲取 after_nms_topN proposals丈氓,這里cfg.TEST.RPN_POST_NMS_TOP_N = 2000周循,取前2000個(gè)(原來沒有2000個(gè)可能出錯(cuò))
Stage 1 Fast R-CNN using RPN proposals, init from ImageNet model
train_fast_rcnn中:
cfg.TRAIN.PROPOSAL_METHOD = 'rpn'
將調(diào)用pascal_voc.py中rpn_roidb
cfg.TRAIN.IMS_PER_BATCH = 2
get_roidb
準(zhǔn)備roidb, imdb,train_net
訓(xùn)練RPN
get_roidb中:
imdb = get_imdb(imdb_name)
初始化imdb類万俗,調(diào)用factory.py和pascal.py
訓(xùn)練RPN時(shí)rpn_file=None湾笛,之后ground truth的框
roidb = get_training_roidb(imdb)
調(diào)用train.py中g(shù)et_training_roidb函數(shù),得到roidb
get_training_roidb中:
imdb.append_flipped_images()
(imdb.py中)水平翻轉(zhuǎn)可以看作一種數(shù)據(jù)擴(kuò)充方式闰歪,將rpn_roidb函數(shù)(pascal_voc.py)中得到的roidb[i]['boxes']翻轉(zhuǎn)嚎研,圖像索引加倍
rdl_roidb.prepare_roidb(imdb)
(roidb.py中)得到roidb[i]['max_classes'],roidb[i]['max_overlaps']库倘,roidb[i]['image']临扮,roidb[i]['width'],roidb[i]['height']
rpn_roidb中:
gt_roidb = self.gt_roidb()
首先得到gt_roidb教翩,ground truth
rpn_roidb = self._load_rpn_roidb(gt_roidb)
得到從rpn_file得到rpn_roidb杆勇, 'gt_overlaps' 是和gt_roidb的最大重疊值,num_classes列只有一列有值其余為0
roidb = imdb.merge_roidbs(gt_roidb, rpn_roidb)
疊加到一起
train_fast_rcnn中g(shù)et_roidb之后調(diào)用train_net訓(xùn)練:
model_paths = train_net(solver, roidb, output_dir,pretrained_model=init_model,max_iters=max_iters)
tran.py中train_net:
roidb = filter_roidb(roidb)
圖片不滿足至少有一個(gè)前景或至少有一個(gè)背景的條件饱亿,overlaps:0-0.5背景0.5-1前景
sw = SolverWrapper(solver_prototxt, roidb, output_dir,pretrained_model=pretrained_model)
加載預(yù)訓(xùn)練模型靶橱,以及加載回歸框參數(shù)rdl_roidb.add_bbox_regression_targets(roidb)
roidb.py中add_bbox_regression_targets:
roidb[im_i]['bbox_targets'] = _compute_targets(rois, max_overlaps, max_classes)
(均值方差計(jì)算)未知
_compute_targets:
gt_inds:
ground-truth ROIs
ex_inds:
fg ROIs(這里判斷overlaps閾值大于0.5,所以包含了gt_inds路捧,因?yàn)間round-truth的overlaps=1)
ex_gt_overlaps:
ex ROI和gt ROI的overlaps,返回num_ex * num_gt
gt_assignment:
每個(gè)ex ROI和gt ROI overlaps最大的gt ROI索引
gt_rois传黄、ex_rois:
gt_inds和ex_inds對應(yīng)的box
targets:
rois.shape[0]* 5杰扫,第一列是labels,之后ex_inds處寫label膘掰,其他行是0章姓。后四列是4個(gè)方位偏移佳遣,也是ex_inds處才寫,之前說ex_inds包含gt_inds凡伊,但偏移量是0零渐,所以不進(jìn)行回歸
def _compute_targets(rois, overlaps, labels):
"""Compute bounding-box regression targets for an image."""
# Indices of ground-truth ROIs
gt_inds = np.where(overlaps == 1)[0]
if len(gt_inds) == 0:
# Bail if the image has no ground-truth ROIs
return np.zeros((rois.shape[0], 5), dtype=np.float32)
# Indices of examples for which we try to make predictions
ex_inds = np.where(overlaps >= cfg.TRAIN.BBOX_THRESH)[0]
# Get IoU overlap between each ex ROI and gt ROI
ex_gt_overlaps = bbox_overlaps(
np.ascontiguousarray(rois[ex_inds, :], dtype=np.float),
np.ascontiguousarray(rois[gt_inds, :], dtype=np.float))
# Find which gt ROI each ex ROI has max overlap with:
# this will be the ex ROI's gt target
gt_assignment = ex_gt_overlaps.argmax(axis=1)
gt_rois = rois[gt_inds[gt_assignment], :]
ex_rois = rois[ex_inds, :]
targets = np.zeros((rois.shape[0], 5), dtype=np.float32)
targets[ex_inds, 0] = labels[ex_inds]
targets[ex_inds, 1:] = bbox_transform(ex_rois, gt_rois)
return targets
stage1_fast_rcnn_train.pt文件中:
name: "ZF"
layer {
name: 'data'
type: 'Python'
top: 'data'
top: 'rois'
top: 'labels'
top: 'bbox_targets'
top: 'bbox_inside_weights'
top: 'bbox_outside_weights'
python_param {
module: 'roi_data_layer.layer'
layer: 'RoIDataLayer'
param_str: "'num_classes': 21"
}
}
調(diào)用roi_data_layer.layer
層,該層就是調(diào)用程序minibatch.py中g(shù)et_minibatch函數(shù)系忙。(end2end方法中ProposalTargetLayer
層起相同作用)
get_minibatch
rois_per_image:
每個(gè)圖像最多包含的boxes個(gè)數(shù)诵盼,這里取128/1=128
fg_rois_per_image:
rois_per_image的0.25,前景32個(gè)(正負(fù)樣本比:1:3)
im_blob, im_scales = _get_image_blob(roidb, random_scale_inds)
得到縮放系數(shù)银还,原圖和網(wǎng)絡(luò)輸入圖像的比例风宁,im_blob網(wǎng)絡(luò)的輸入blob
if cfg.TRAIN.HAS_RPN:
訓(xùn)練RPN的時(shí)候
else:
訓(xùn)練fast-rcnn
主要調(diào)用_sample_rois函數(shù)
得到labels, overlaps, im_rois, bbox_targets, bbox_inside_weights,roidb中坐標(biāo)都是對應(yīng)原圖(P*Q)蛹疯,所以im_rois也是
rois = _project_im_rois(im_rois, im_scales[im_i])
得到符合網(wǎng)絡(luò)輸入的roi (M*N)
rois_blob:
5列的二維數(shù)組戒财,第一列代表這個(gè)box是batch中第幾個(gè)圖像,后四列是坐標(biāo)捺弦,所有batch的rois都疊加在一起饮寞,賦值給blobs['rois']
labels_blob,bbox_targets_blob,bbox_inside_blob
也是batch疊加在一起,賦值給blobs['labels'],blobs['bbox_targets'],blobs['bbox_inside_weights'],blobs['bbox_outside_weights']
blobs['bbox_outside_weights'] = np.array(bbox_inside_blob > 0).astype(np.float32)
bbox_outside_weights計(jì)算方法(用途未知)
函數(shù)返回blob
_sample_rois
fg_inds = np.where(overlaps >= cfg.TRAIN.FG_THRESH)[0]
大于0.5為前景(有一個(gè)問題是這個(gè)閾值會篩選出ground-truth列吼,overlaps=1幽崩,之后隨機(jī)選取正樣本也可能選出gt,對于gt回歸的偏移量為0)
fg_rois_per_this_image = np.minimum(fg_rois_per_image, fg_inds.size)
取32和前景數(shù)最小的值
fg_inds = npr.choice(fg_inds, size=fg_rois_per_this_image, replace=False)
隨機(jī)取fg_rois_per_this_image個(gè)前景
bg_rois_per_this_image:
背景個(gè)數(shù)冈欢,64-fg_rois_per_this_image和overlaps小于0.5的最小值
bg_inds:
隨機(jī)取的背景數(shù)的索引
keep_inds = np.append(fg_inds, bg_inds)
前景和背景索引疊加
labels = labels[keep_inds]
得到labels
labels[fg_rois_per_this_image:] = 0
背景的labels=0
overlaps = overlaps[keep_inds]
overlaps
rois = rois[keep_inds]
rois
bbox_targets, bbox_inside_weights = _get_bbox_regression_labels(roidb['bbox_targets'][keep_inds, :], num_classes)
bbox_targets維度keep_inds*84歉铝,(84 = 4 * num_classes),由原來的5列變成84列凑耻,將其中的前景行對應(yīng)類別的偏移量賦值(4個(gè)參數(shù))太示,其他列是0;bbox_inside_weights與bbox_targets維度相同香浩,bbox_targets賦值四個(gè)偏移量的位置类缤,bbox_inside_weights全賦值為1
"""Compute minibatch blobs for training a Fast R-CNN network."""
def get_minibatch(roidb, num_classes):
"""Given a roidb, construct a minibatch sampled from it."""
num_images = len(roidb)
# Sample random scales to use for each image in this batch
random_scale_inds = npr.randint(0, high=len(cfg.TRAIN.SCALES),
size=num_images)
assert(cfg.TRAIN.BATCH_SIZE % num_images == 0), \
'num_images ({}) must divide BATCH_SIZE ({})'. \
format(num_images, cfg.TRAIN.BATCH_SIZE)
rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images
# fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image)
fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image).astype(np.int)
# Get the input image blob, formatted for caffe
im_blob, im_scales = _get_image_blob(roidb, random_scale_inds)
blobs = {'data': im_blob}
if cfg.TRAIN.HAS_RPN:
assert len(im_scales) == 1, "Single batch only"
assert len(roidb) == 1, "Single batch only"
# gt boxes: (x1, y1, x2, y2, cls)
gt_inds = np.where(roidb[0]['gt_classes'] != 0)[0]
gt_boxes = np.empty((len(gt_inds), 5), dtype=np.float32)
gt_boxes[:, 0:4] = roidb[0]['boxes'][gt_inds, :] * im_scales[0]
gt_boxes[:, 4] = roidb[0]['gt_classes'][gt_inds]
blobs['gt_boxes'] = gt_boxes
blobs['im_info'] = np.array(
[[im_blob.shape[2], im_blob.shape[3], im_scales[0]]],
dtype=np.float32)
else: # not using RPN
# Now, build the region of interest and label blobs
rois_blob = np.zeros((0, 5), dtype=np.float32)
labels_blob = np.zeros((0), dtype=np.float32)
bbox_targets_blob = np.zeros((0, 4 * num_classes), dtype=np.float32)
bbox_inside_blob = np.zeros(bbox_targets_blob.shape, dtype=np.float32)
# all_overlaps = []
for im_i in xrange(num_images):
labels, overlaps, im_rois, bbox_targets, bbox_inside_weights \
= _sample_rois(roidb[im_i], fg_rois_per_image, rois_per_image,
num_classes)
# Add to RoIs blob
rois = _project_im_rois(im_rois, im_scales[im_i])
batch_ind = im_i * np.ones((rois.shape[0], 1))
rois_blob_this_image = np.hstack((batch_ind, rois))
rois_blob = np.vstack((rois_blob, rois_blob_this_image))
# Add to labels, bbox targets, and bbox loss blobs
labels_blob = np.hstack((labels_blob, labels))
bbox_targets_blob = np.vstack((bbox_targets_blob, bbox_targets))
bbox_inside_blob = np.vstack((bbox_inside_blob, bbox_inside_weights))
# all_overlaps = np.hstack((all_overlaps, overlaps))
# For debug visualizations
# _vis_minibatch(im_blob, rois_blob, labels_blob, all_overlaps)
blobs['rois'] = rois_blob
blobs['labels'] = labels_blob
if cfg.TRAIN.BBOX_REG:
blobs['bbox_targets'] = bbox_targets_blob
blobs['bbox_inside_weights'] = bbox_inside_blob
blobs['bbox_outside_weights'] = \
np.array(bbox_inside_blob > 0).astype(np.float32)
return blobs
def _sample_rois(roidb, fg_rois_per_image, rois_per_image, num_classes):
"""Generate a random sample of RoIs comprising foreground and background
examples.
"""
# label = class RoI has max overlap with
labels = roidb['max_classes']
overlaps = roidb['max_overlaps']
rois = roidb['boxes']
# Select foreground RoIs as those with >= FG_THRESH overlap
fg_inds = np.where(overlaps >= cfg.TRAIN.FG_THRESH)[0]
# Guard against the case when an image has fewer than fg_rois_per_image
# foreground RoIs
fg_rois_per_this_image = np.minimum(fg_rois_per_image, fg_inds.size)
# Sample foreground regions without replacement
if fg_inds.size > 0:
fg_inds = npr.choice(
fg_inds, size=fg_rois_per_this_image, replace=False)
# Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
bg_inds = np.where((overlaps < cfg.TRAIN.BG_THRESH_HI) &
(overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]
# Compute number of background RoIs to take from this image (guarding
# against there being fewer than desired)
bg_rois_per_this_image = rois_per_image - fg_rois_per_this_image
bg_rois_per_this_image = np.minimum(bg_rois_per_this_image,
bg_inds.size)
# Sample foreground regions without replacement
if bg_inds.size > 0:
bg_inds = npr.choice(
bg_inds, size=bg_rois_per_this_image, replace=False)
# The indices that we're selecting (both fg and bg)
keep_inds = np.append(fg_inds, bg_inds)
# Select sampled values from various arrays:
labels = labels[keep_inds]
# Clamp labels for the background RoIs to 0
labels[fg_rois_per_this_image:] = 0
overlaps = overlaps[keep_inds]
rois = rois[keep_inds]
bbox_targets, bbox_inside_weights = _get_bbox_regression_labels(
roidb['bbox_targets'][keep_inds, :], num_classes)
return labels, overlaps, rois, bbox_targets, bbox_inside_weights
def _get_image_blob(roidb, scale_inds):
"""Builds an input blob from the images in the roidb at the specified
scales.
"""
num_images = len(roidb)
processed_ims = []
im_scales = []
for i in xrange(num_images):
im = cv2.imread(roidb[i]['image'])
if roidb[i]['flipped']:
im = im[:, ::-1, :]
target_size = cfg.TRAIN.SCALES[scale_inds[i]]
im, im_scale = prep_im_for_blob(im, cfg.PIXEL_MEANS, target_size,
cfg.TRAIN.MAX_SIZE)
im_scales.append(im_scale)
processed_ims.append(im)
# Create a blob to hold the input images
blob = im_list_to_blob(processed_ims)
return blob, im_scales
def _project_im_rois(im_rois, im_scale_factor):
"""Project image RoIs into the rescaled training image."""
rois = im_rois * im_scale_factor
return rois
def _get_bbox_regression_labels(bbox_target_data, num_classes):
"""Bounding-box regression targets are stored in a compact form in the
roidb.
This function expands those targets into the 4-of-4*K representation used
by the network (i.e. only one class has non-zero targets). The loss weights
are similarly expanded.
Returns:
bbox_target_data (ndarray): N x 4K blob of regression targets
bbox_inside_weights (ndarray): N x 4K blob of loss weights
"""
clss = bbox_target_data[:, 0]
bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32)
bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32)
inds = np.where(clss > 0)[0]
# for ind in inds:
# cls = clss[ind]
# start = 4 * cls
# end = start + 4
# bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
# bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
# return bbox_targets, bbox_inside_weights
for ind in inds:
ind = int(ind)
cls = clss[ind]
start = int(4 * cls)
end = int(start + 4)
bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
return bbox_targets, bbox_inside_weights