接上一篇文章Faster RCNN源碼解析(1).
第二階段我會拆分為個3模塊担忧,在下面詳細(xì)介紹。
RPN
關(guān)于faster rcnn中PRN的介紹大家可以自己看paper或者找點(diǎn)論壇看看, medium, CSDN, 知乎, 包括簡書都有大量的資料做介紹, 本文只站在源碼的角度給你介紹每一步的實(shí)現(xiàn), 所以就不闡述原理了怕轿,見諒~~
代碼入口
lib/model/train_val.py
# Construct the computation graph
lr, train_op = self.construct_graph(sess)
lr是學(xué)習(xí)率, train_op是訓(xùn)練網(wǎng)絡(luò)的一系列操作。
讓我們走進(jìn)construct_graph函數(shù)
lib/model/train_val.py
def construct_graph(self, sess):
with sess.graph.as_default():
# Set the random seed for tensorflow
tf.set_random_seed(cfg.RNG_SEED)
# Build the main computation graph
layers = self.net.create_architecture('TRAIN', self.imdb.num_classes, tag='default',
anchor_scales=cfg.ANCHOR_SCALES,
anchor_ratios=cfg.ANCHOR_RATIOS)
# Define the loss
loss = layers['total_loss']
# Set learning rate and momentum
lr = tf.Variable(cfg.TRAIN.LEARNING_RATE, trainable=False)
self.optimizer = tf.train.MomentumOptimizer(lr, cfg.TRAIN.MOMENTUM)
# Compute the gradients with regard to the loss
gvs = self.optimizer.compute_gradients(loss)
# Double the gradient of the bias if set
if cfg.TRAIN.DOUBLE_BIAS:
final_gvs = []
with tf.variable_scope('Gradient_Mult') as scope:
for grad, var in gvs:
scale = 1.
if cfg.TRAIN.DOUBLE_BIAS and '/biases:' in var.name:
scale *= 2.
if not np.allclose(scale, 1.0):
grad = tf.multiply(grad, scale)
final_gvs.append((grad, var))
train_op = self.optimizer.apply_gradients(final_gvs)
else:
train_op = self.optimizer.apply_gradients(gvs)
# We will handle the snapshots ourselves
self.saver = tf.train.Saver(max_to_keep=100000)
# Write the train and validation information to tensorboard
self.writer = tf.summary.FileWriter(self.tbdir, sess.graph)
self.valwriter = tf.summary.FileWriter(self.tbvaldir)
return lr, train_op
代碼其實(shí)將流程闡述的非常清楚,我再廢話給大家總結(jié)一下~~
- 給tensorflow設(shè)置隨機(jī)種子seed(為啥要這樣凌盯,可以百度一下)
- 建立一個計算圖computational graph(重點(diǎn)狸剃,下面介紹)
- 定義了一個執(zhí)行Momentum算法的優(yōu)化器
accumulation = momentum * accumulation + gradient
variable -= learning_rate * accumulation
- 計算損失參數(shù)的梯度self.optimizer.compute_gradients(loss)
- 將梯度應(yīng)用于變量self.optimizer.apply_gradients(gvs), 返回值就是train_op
- 定義Saver(用于快照-緩存), writer, valwriter(把信息及時傳入tensorboard)
然后走進(jìn)create_architecture函數(shù)
lib/nets/network.py
def create_architecture(self, mode, num_classes, tag=None,
anchor_scales=(8, 16, 32), anchor_ratios=(0.5, 1, 2)):
self._image = tf.placeholder(tf.float32, shape=[1, None, None, 3])
self._im_info = tf.placeholder(tf.float32, shape=[3])
self._gt_boxes = tf.placeholder(tf.float32, shape=[None, 5])
self._tag = tag
self._num_classes = num_classes
self._mode = mode
self._anchor_scales = anchor_scales
self._num_scales = len(anchor_scales)
self._anchor_ratios = anchor_ratios
self._num_ratios = len(anchor_ratios)
self._num_anchors = self._num_scales * self._num_ratios
training = mode == 'TRAIN'
testing = mode == 'TEST'
assert tag != None
# handle most of the regularizers here
weights_regularizer = tf.contrib.layers.l2_regularizer(cfg.TRAIN.WEIGHT_DECAY)
if cfg.TRAIN.BIAS_DECAY:
biases_regularizer = weights_regularizer
else:
biases_regularizer = tf.no_regularizer
# list as many types of layers as possible, even if they are not used now
with arg_scope([slim.conv2d, slim.conv2d_in_plane, \
slim.conv2d_transpose, slim.separable_conv2d, slim.fully_connected],
weights_regularizer=weights_regularizer,
biases_regularizer=biases_regularizer,
biases_initializer=tf.constant_initializer(0.0)):
rois, cls_prob, bbox_pred = self._build_network(training)
layers_to_output = {'rois': rois}
for var in tf.trainable_variables():
self._train_summaries.append(var)
if testing:
stds = np.tile(np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS), (self._num_classes))
means = np.tile(np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS), (self._num_classes))
self._predictions["bbox_pred"] *= stds
self._predictions["bbox_pred"] += means
else:
self._add_losses()
layers_to_output.update(self._losses)
val_summaries = []
with tf.device("/cpu:0"):
val_summaries.append(self._add_gt_image_summary())
for key, var in self._event_summaries.items():
val_summaries.append(tf.summary.scalar(key, var))
for key, var in self._score_summaries.items():
self._add_score_summary(key, var)
for var in self._act_summaries:
self._add_act_summary(var)
for var in self._train_summaries:
self._add_train_summary(var)
self._summary_op = tf.summary.merge_all()
self._summary_op_val = tf.summary.merge(val_summaries)
layers_to_output.update(self._predictions)
return layers_to_output
很多人(包括我自己)對tensorflow還不是很熟悉掐隐,所以這里還是給大家概括一下程序流程
- 給network的成員變量賦值
- 定義權(quán)重weights的正則regularizer
- 建立網(wǎng)絡(luò)self._build_network(training) (重點(diǎn))
- 定義損失函數(shù), 包括RPN class loss, RPN bbox loss,整個RCNN網(wǎng)絡(luò)的class loss和最終確定的物體邊框bbox loss, 細(xì)節(jié)可以看這個函數(shù)_add_losses
- 更新一下tensorboard用得到的參數(shù)
然后我們了解一下_build_network函數(shù)
lib/nets/network.py
def _build_network(self, is_training=True):
# select initializers
if cfg.TRAIN.TRUNCATED:
initializer = tf.truncated_normal_initializer(mean=0.0, stddev=0.01)
initializer_bbox = tf.truncated_normal_initializer(mean=0.0, stddev=0.001)
else:
initializer = tf.random_normal_initializer(mean=0.0, stddev=0.01)
initializer_bbox = tf.random_normal_initializer(mean=0.0, stddev=0.001)
net_conv = self._image_to_head(is_training)
with tf.variable_scope(self._scope, self._scope):
# build the anchors for the image
self._anchor_component()
# region proposal network
rois = self._region_proposal(net_conv, is_training, initializer)
# region of interest pooling
if cfg.POOLING_MODE == 'crop':
pool5 = self._crop_pool_layer(net_conv, rois, "pool5")
else:
raise NotImplementedError
fc7 = self._head_to_tail(pool5, is_training)
with tf.variable_scope(self._scope, self._scope):
# region classification
cls_prob, bbox_pred = self._region_classification(fc7, is_training,
initializer, initializer_bbox)
self._score_summaries.update(self._predictions)
return rois, cls_prob, bbox_pred
- 初始化權(quán)重weight, 用截斷的normal initializer或者隨機(jī)的normal initializer
- 構(gòu)建主干網(wǎng)絡(luò)前端_image_to_head
- 構(gòu)建anchors
- 構(gòu)建RPN
- ROI pooling 調(diào)用函數(shù)_crop_pool_layer
- 構(gòu)建主干網(wǎng)絡(luò)的尾部 fc7 = self._head_to_tail(pool5, is_training)
- object分類以及邊框預(yù)測的回歸
各位是不是一臉萌幣。钞馁。虑省。不要緊, 下面我會給大家詳細(xì)介紹上述的每一個步驟。
構(gòu)建主干網(wǎng)絡(luò)前端
_image_to_head方法是一個類Network的一個abstract class, 以它的實(shí)現(xiàn)類Resnet 101為例
lib/nets/resnet_v1.py
def _image_to_head(self, is_training, reuse=None):
assert (0 <= cfg.RESNET.FIXED_BLOCKS <= 3)
# Now the base is always fixed during training
with slim.arg_scope(resnet_arg_scope(is_training=False)):
net_conv = self._build_base()
if cfg.RESNET.FIXED_BLOCKS > 0:
with slim.arg_scope(resnet_arg_scope(is_training=False)):
net_conv, _ = resnet_v1.resnet_v1(net_conv,
self._blocks[0:cfg.RESNET.FIXED_BLOCKS],
global_pool=False,
include_root_block=False,
reuse=reuse,
scope=self._scope)
if cfg.RESNET.FIXED_BLOCKS < 3:
with slim.arg_scope(resnet_arg_scope(is_training=is_training)):
net_conv, _ = resnet_v1.resnet_v1(net_conv,
self._blocks[cfg.RESNET.FIXED_BLOCKS:-1],
global_pool=False,
include_root_block=False,
reuse=reuse,
scope=self._scope)
self._act_summaries.append(net_conv)
self._layers['head'] = net_conv
return net_conv
def _build_base(self):
with tf.variable_scope(self._scope, self._scope):
net = resnet_utils.conv2d_same(self._image, 64, 7, stride=2, scope='conv1')
net = tf.pad(net, [[0, 0], [1, 1], [1, 1], [0, 0]])
net = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID', scope='pool1')
我會在下一篇文章中專門介紹resnet, 這里還是只做一個流程的簡介僧凰。
- 調(diào)用_build_base函數(shù)手動建立初始的幾層: input -> 64 * 7 * 7 filters, stride = 2 -> padding -> max pooling
- 構(gòu)建網(wǎng)絡(luò)主干, 因?yàn)橹岸x過self._blocks
self._blocks = [resnet_v1_block('block1', base_depth=64, num_units=3, stride=2),
resnet_v1_block('block2', base_depth=128, num_units=8, stride=2),
# use stride 1 for the last conv4 layer
resnet_v1_block('block3', base_depth=256, num_units=36, stride=1),
resnet_v1_block('block4', base_depth=512, num_units=3, stride=1)]
調(diào)用slim的resnet_v1的接口實(shí)現(xiàn)這段網(wǎng)絡(luò)resnet_v1.resnet_v1()
構(gòu)建anchors
anchor 是什么探颈?這里借用一點(diǎn)知乎作者馬塔的回答:
anchor的本質(zhì)是什么,本質(zhì)是將相同尺寸的 conv5_3 層的輸出训措,倒推得到不同尺寸的輸入伪节。接下來是anchor的窗口尺寸,詳細(xì)說下這個尺寸的來源绩鸣,最基本的anchor只有一個尺寸怀大,是16*16的尺寸,然后設(shè)定了基本的面積scale是(8呀闻,16化借,32),用這三個scale乘以16就得到了三個面積尺寸(1282捡多,2562蓖康,512^2),然后在每個面積尺寸下垒手,取三種不同的長寬比例(1:1,1:2,2:1).這樣一來蒜焊,我們得到了一共9種面積尺寸各異的anchor。示意圖如下:
不過這個示意圖其實(shí)比較有誤導(dǎo)性淫奔,首先它圖中的9個框并不是在同一個中心點(diǎn)的山涡,而實(shí)際上,是應(yīng)該在每個特征圖的每個點(diǎn)作為中心點(diǎn)生成9 個框唆迁; 其次鸭丛,生成的 anchor 尺寸大小不是以特征圖為基準(zhǔn)的,甚至毫無關(guān)系唐责,而是以 anchor ratio 和 anchor scale 得到最終的大小鳞溉,并且其最大的 anchor 也基本和 resize 之后的圖大小相當(dāng)。
在 generate_anchors 代碼文件中鼠哥,可以看到如下數(shù)據(jù)熟菲,
# anchors =
# -83 -39 100 56
# -175 -87 192 104
# -359 -183 376 200
# -55 -55 72 72
# -119 -119 136 136
# -247 -247 264 264
# -35 -79 52 96
# -79 -167 96 184
# -167 -343 184 360
這就是生成的最基本的9個anchor看政,這個anchor的坐標(biāo)是xyxy類型的,它表示了圖片的左上角的第1個9個anchor的坐標(biāo)抄罕,后面用到的所有anchor都是用它在特征圖上平移得到的(它代表的坐標(biāo)是resize 后的圖片坐標(biāo)而不是原圖)允蚣。
至于這個anchor到底是怎么用的,這個是理解整個問題的關(guān)鍵呆贿。
上面我們已經(jīng)得到了基礎(chǔ)網(wǎng)絡(luò)最終的conv5_3 輸出為138671024(1024是層數(shù))嚷兔,在這個特征參數(shù)的基礎(chǔ)上,通過一個3x3的滑動窗口做入,在這個3867的區(qū)域上進(jìn)行滑動冒晰,stride=1,padding=2竟块,這樣一來壶运,滑動得到的就是3867個3x3的窗口。
對于每個3x3的窗口浪秘,計算這個滑動窗口的中心點(diǎn)所對應(yīng)的原始圖片的中心點(diǎn)蒋情。然后作者假定,這個3x3窗口耸携,是從原始圖片上通過SPP池化得到的恕出,而這個池化的區(qū)域的面積以及比例,就是一個個的anchor违帆。換句話說,對于每個3x3窗口金蜀,作者假定它來自9種不同原始區(qū)域的池化刷后,但是這些池化在原始圖片中的中心點(diǎn),都完全一樣渊抄。這個中心點(diǎn)尝胆,就是剛才提到的,3x3窗口中心點(diǎn)所對應(yīng)的原始圖片中的中心點(diǎn)护桦。如此一來含衔,在每個窗口位置,我們都可以根據(jù)9個不同長寬比例二庵、不同面積的anchor贪染,逆向推導(dǎo)出它所對應(yīng)的原始圖片中的一個區(qū)域,這個區(qū)域的尺寸以及坐標(biāo)催享,都是已知的杭隙。而這個區(qū)域,就是我們想要的 proposal因妙。所以我們通過滑動窗口和anchor痰憎,成功得到了 3867x9 個原始圖片的proposal票髓。接下來,每個proposal我們只輸出6個參數(shù):每個 proposal 和 ground truth 進(jìn)行比較得到的前景概率和背景概率(2個參數(shù))(對應(yīng) cls_score)铣耘;由于每個 proposal 和 ground truth 位置及尺寸上的差異洽沟,從 proposal 通過平移放縮得到 ground truth 需要的4個平移放縮參數(shù)(對應(yīng) bbox_pred)。
加上一點(diǎn)我的理解蜗细,anchor 是用來做多尺度的目標(biāo)檢測的裆操,它是用來代替圖像金字塔和特征金字塔的,它為什么可以達(dá)到這樣的目的鳄乏?可以看看它的最后一層的輸出是 MN(92)跷车, 如果我們只看它在特征圖 MN 個特征點(diǎn)的第一個點(diǎn)的第一個卷積核,它代表了什么含義橱野?它相當(dāng)于用這個卷積核去綜合圖片該點(diǎn)附近(33朽缴,上一步進(jìn)行了33的卷積)的信息,判斷有沒有第一個尺寸的目標(biāo)水援,也就是說每個卷積核都負(fù)責(zé)了一個尺寸的目標(biāo)檢測密强,那么18個卷積核,每2個負(fù)責(zé)一個任務(wù)蜗元,就達(dá)到了多尺度目標(biāo)檢測的目的损姜,很巧妙的一個思路局义,從最終的效果來看,它實(shí)際上就是一個多尺度的目標(biāo)熱力圖,或者用作者的話說若厚,就相當(dāng)于一個‘注意力’機(jī)制。
另外值得提出的是這里使用的是全卷積結(jié)構(gòu)(33的卷積颜阐,然后接11的卷積)跪解,也就是說 M*N 也是一個二維結(jié)構(gòu),和原圖的像素二維結(jié)構(gòu)是對應(yīng)的楷兽,那么我們就能相應(yīng)的判斷出該特征點(diǎn)對應(yīng)的原圖是否存在目標(biāo)地熄。 個人感覺,理解了這里的全卷積結(jié)構(gòu)和 anchor 的機(jī)制芯杀,整個 faster rcnn 就明晰很多了端考。
最后明確的一點(diǎn)就是在代碼中,anchor揭厚,proposal却特,rois ,boxes 代表的含義其實(shí)都是一樣的棋弥,都是推薦的區(qū)域或者框核偿,不過有所區(qū)別的地方在于這幾個名詞有一個遞進(jìn)的關(guān)系,最開始的是錨定的框 anchor顽染,數(shù)量最多有約20000個(根據(jù)resize后的圖片大小不同而有數(shù)量有所變化)漾岳,然后是RPN網(wǎng)絡(luò)推薦的框 proposal轰绵,數(shù)量較多,train時候有2000個尼荆,最后是實(shí)際分類時候用到的 rois 框左腔,每張圖片有256個;最后得到的結(jié)果就是 boxes捅儒。
好了, 以上就是轉(zhuǎn)自知乎對于anchor的一個詳細(xì)解釋液样,我知道大家是來看代碼的
入口在network.py的 _build_network函數(shù)中
# build the anchors for the image
self._anchor_component()
......
......
def _anchor_component(self):
with tf.variable_scope('ANCHOR_' + self._tag) as scope:
# just to get the shape right
height = tf.to_int32(tf.ceil(self._im_info[0] / np.float32(self._feat_stride[0])))
width = tf.to_int32(tf.ceil(self._im_info[1] / np.float32(self._feat_stride[0])))
if cfg.USE_E2E_TF:
anchors, anchor_length = generate_anchors_pre_tf(
height,
width,
self._feat_stride,
self._anchor_scales,
self._anchor_ratios
)
else:
anchors, anchor_length = tf.py_func(generate_anchors_pre,
[height, width,
self._feat_stride, self._anchor_scales, self._anchor_ratios],
[tf.float32, tf.int32], name="generate_anchors")
anchors.set_shape([None, 4])
anchor_length.set_shape([])
self._anchors = anchors
self._anchor_length = anchor_length
- 首先計算好偏移量
- 生成初始的9個anchor
lib/layer_utils/snippets.py
def generate_anchors_pre_tf(height, width, feat_stride=16, anchor_scales=(8, 16, 32), anchor_ratios=(0.5, 1, 2)):
shift_x = tf.range(width) * feat_stride # width
shift_y = tf.range(height) * feat_stride # height
shift_x, shift_y = tf.meshgrid(shift_x, shift_y)
sx = tf.reshape(shift_x, shape=(-1,))
sy = tf.reshape(shift_y, shape=(-1,))
shifts = tf.transpose(tf.stack([sx, sy, sx, sy]))
K = tf.multiply(width, height)
shifts = tf.transpose(tf.reshape(shifts, shape=[1, K, 4]), perm=(1, 0, 2))
anchors = generate_anchors(ratios=np.array(anchor_ratios), scales=np.array(anchor_scales))
A = anchors.shape[0]
anchor_constant = tf.constant(anchors.reshape((1, A, 4)), dtype=tf.int32)
length = K * A
anchors_tf = tf.reshape(tf.add(anchor_constant, shifts), shape=(length, 4))
return tf.cast(anchors_tf, dtype=tf.float32), length
生成anchor的代碼一目了然,這個腳本對系統(tǒng)環(huán)境沒有要求巧还,所以大家也可以直接運(yùn)行該文件鞭莽,打點(diǎn)斷點(diǎn)調(diào)試,就會很清楚整個流程麸祷。
lib/layer_utils/generate_anchors.py
def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
scales=2 ** np.arange(3, 6)):
"""
Generate anchor (reference) windows by enumerating aspect ratios X
scales wrt a reference (0, 0, 15, 15) window.
"""
base_anchor = np.array([1, 1, base_size, base_size]) - 1
ratio_anchors = _ratio_enum(base_anchor, ratios)
anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
for i in range(ratio_anchors.shape[0])])
return anchors
def _whctrs(anchor):
"""
Return width, height, x center, and y center for an anchor (window).
"""
w = anchor[2] - anchor[0] + 1
h = anchor[3] - anchor[1] + 1
x_ctr = anchor[0] + 0.5 * (w - 1)
y_ctr = anchor[1] + 0.5 * (h - 1)
return w, h, x_ctr, y_ctr
def _mkanchors(ws, hs, x_ctr, y_ctr):
"""
Given a vector of widths (ws) and heights (hs) around a center
(x_ctr, y_ctr), output a set of anchors (windows).
"""
ws = ws[:, np.newaxis]
hs = hs[:, np.newaxis]
anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
y_ctr - 0.5 * (hs - 1),
x_ctr + 0.5 * (ws - 1),
y_ctr + 0.5 * (hs - 1)))
return anchors
def _ratio_enum(anchor, ratios):
"""
Enumerate a set of anchors for each aspect ratio wrt an anchor.
"""
w, h, x_ctr, y_ctr = _whctrs(anchor)
size = w * h
size_ratios = size / ratios
ws = np.round(np.sqrt(size_ratios))
hs = np.round(ws * ratios)
anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
return anchors
def _scale_enum(anchor, scales):
"""
Enumerate a set of anchors for each scale wrt an anchor.
"""
w, h, x_ctr, y_ctr = _whctrs(anchor)
ws = w * scales
hs = h * scales
anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
return anchors
構(gòu)建RPN
RPN層利用anchors在圖片上的滑動澎怒,與256(512)個3*3的滑窗做卷積,生成全連接層阶牍。其主要作用有二:
- 預(yù)測proposal的中心錨點(diǎn)對應(yīng)的坐標(biāo)x,y以及寬高w,h
- 判斷proposal區(qū)域是前景還是背景
代碼入口
# region proposal network
rois = self._region_proposal(net_conv, is_training, initializer)
lib/nets/network.py
def _region_proposal(self, net_conv, is_training, initializer):
rpn = slim.conv2d(net_conv, cfg.RPN_CHANNELS, [3, 3], trainable=is_training, weights_initializer=initializer,
scope="rpn_conv/3x3")
self._act_summaries.append(rpn)
rpn_cls_score = slim.conv2d(rpn, self._num_anchors * 2, [1, 1], trainable=is_training,
weights_initializer=initializer,
padding='VALID', activation_fn=None, scope='rpn_cls_score')
# change it so that the score has 2 as its channel size
rpn_cls_score_reshape = self._reshape_layer(rpn_cls_score, 2, 'rpn_cls_score_reshape')
rpn_cls_prob_reshape = self._softmax_layer(rpn_cls_score_reshape, "rpn_cls_prob_reshape")
rpn_cls_pred = tf.argmax(tf.reshape(rpn_cls_score_reshape, [-1, 2]), axis=1, name="rpn_cls_pred")
rpn_cls_prob = self._reshape_layer(rpn_cls_prob_reshape, self._num_anchors * 2, "rpn_cls_prob")
rpn_bbox_pred = slim.conv2d(rpn, self._num_anchors * 4, [1, 1], trainable=is_training,
weights_initializer=initializer,
padding='VALID', activation_fn=None, scope='rpn_bbox_pred')
if is_training:
rois, roi_scores = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
rpn_labels = self._anchor_target_layer(rpn_cls_score, "anchor")
# Try to have a deterministic order for the computing graph, for reproducibility
with tf.control_dependencies([rpn_labels]):
rois, _ = self._proposal_target_layer(rois, roi_scores, "rpn_rois")
else:
if cfg.TEST.MODE == 'nms':
rois, _ = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
elif cfg.TEST.MODE == 'top':
rois, _ = self._proposal_top_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
else:
raise NotImplementedError
self._predictions["rpn_cls_score"] = rpn_cls_score
self._predictions["rpn_cls_score_reshape"] = rpn_cls_score_reshape
self._predictions["rpn_cls_prob"] = rpn_cls_prob
self._predictions["rpn_cls_pred"] = rpn_cls_pred
self._predictions["rpn_bbox_pred"] = rpn_bbox_pred
self._predictions["rois"] = rois
return rois
_region_proposal這段代碼邏輯看細(xì)節(jié)的話會有點(diǎn)繞喷面,所以先回顧一下整個faster rcnn的流程圖
_region_proposal 函數(shù)分別做了以下幾件事:
- 將特征圖[60 * 40 * 256] (取決于原始圖片的像素還有縮放scale還有選擇的特征提取網(wǎng)絡(luò)) 與 256個 3*3 的flters卷積(近一步提取特征)得到[60 * 40 * 9]的圖
- 與18個11的filters做卷積,也就是92走孽,對應(yīng)著每個像素9個anchor惧辈,乘以2表示每個anchor對應(yīng)2個scores,分別表示前景或者背景磕瓷。通過reshape -> softmax -> reshape 獲取了目標(biāo)是否是物體的預(yù)測以及得分盒齿。輸出的參數(shù)有兩個rpn_cls_pred(預(yù)測結(jié)果),rpn_cls_prob(前景和背景的概率)困食。需要監(jiān)督的信息是Y=0,1县昂,表示這個區(qū)域是否是ground truth。
- 與36個1*1的filters做卷積陷舅,也就是 9 * 4, 得到Anchor Box的坐標(biāo)信息审洞,其實(shí)是偏移量
ground truth:標(biāo)定的框也對應(yīng)一個中心點(diǎn)位置坐標(biāo)x,y和寬高w,h
anchor box: 中心點(diǎn)位置坐標(biāo)x_a,y_a和寬高w_a,h_a
所以莱睁,偏移量:
△x=(x-x_a)/w_a △y=(y-y_a)/h_a
△w=log(w/w_a) △h=log(h/h_a)
通過ground truth box與預(yù)測的anchor box之間的差異來進(jìn)行學(xué)習(xí),從而是RPN網(wǎng)絡(luò)中的權(quán)重能夠?qū)W習(xí)到預(yù)測box的能力
- rois, roi_scores = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois"), 這一步獲取2000個proposals
- rpn_labels = self._anchor_target_layer(rpn_cls_score, "anchor")芒澜,計算每個anchor的label值
- rois, _ = self._proposal_target_layer(rois, roi_scores, "rpn_rois")仰剿,選取__C.TRAIN.BATCH_SIZE個正樣本和負(fù)樣本作為訓(xùn)練的一個mini batch。這一步的作用是給PRN提供的proposals分配標(biāo)簽痴晦,計算proposals和ground truth boxes的偏移量南吮,用于網(wǎng)絡(luò)最后一層(bbox_pred)回歸參數(shù)的學(xué)習(xí)。
接下來詳細(xì)為大家介紹一下步驟4誊酌,5部凑,6
4露乏、rois, roi_scores = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
lib/nets/network.py
def _proposal_layer(self, rpn_cls_prob, rpn_bbox_pred, name):
with tf.variable_scope(name) as scope:
if cfg.USE_E2E_TF:
rois, rpn_scores = proposal_layer_tf(
rpn_cls_prob,
rpn_bbox_pred,
self._im_info,
self._mode,
self._feat_stride,
self._anchors,
self._num_anchors
)
else:
rois, rpn_scores = tf.py_func(proposal_layer,
[rpn_cls_prob, rpn_bbox_pred, self._im_info, self._mode,
self._feat_stride, self._anchors, self._num_anchors],
[tf.float32, tf.float32], name="proposal")
rois.set_shape([None, 5])
rpn_scores.set_shape([None, 1])
return rois, rpn_scores
proposal_layer_tf 的流程如下
- 從config文件讀取配置 post_nms_topN(執(zhí)行NMS算法后proposal的數(shù)量), nms_thresh(NMS 閾值)
- 原始anchor給出的proposal通過學(xué)習(xí)參數(shù)rpn_bbox_pred涂邀,轉(zhuǎn)換為與ground truth接近的邊框瘟仿,裁剪掉超出圖片的部分
- 執(zhí)行NMS算法,獲取最終的proposals
用下圖一個案例來對NMS算法進(jìn)行簡單介紹
如上圖所示比勉,一共有6個識別為人的框劳较,每一個框有一個置信率。
現(xiàn)在需要消除多余的:
- 按置信率排序: 0.95, 0.9, 0.9, 0.8, 0.7, 0.7
- 取最大0.95的框?yàn)橐粋€物體框
- 剩余5個框中浩聋,去掉與0.95框重疊率IoU大于0.6(可以另行設(shè)置)观蜗,則保留0.9, 0.8, 0.7三個框
- 重復(fù)上面的步驟,直到?jīng)]有框了衣洁,0.9為一個框
選出來的為: 0.95, 0.9
lib/layer_utils/proposal_layer.py
def proposal_layer_tf(rpn_cls_prob, rpn_bbox_pred, im_info, cfg_key, _feat_stride, anchors, num_anchors):
if type(cfg_key) == bytes:
cfg_key = cfg_key.decode('utf-8')
pre_nms_topN = cfg[cfg_key].RPN_PRE_NMS_TOP_N
post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N
nms_thresh = cfg[cfg_key].RPN_NMS_THRESH
# Get the scores and bounding boxes
scores = rpn_cls_prob[:, :, :, num_anchors:]
scores = tf.reshape(scores, shape=(-1,))
rpn_bbox_pred = tf.reshape(rpn_bbox_pred, shape=(-1, 4))
proposals = bbox_transform_inv_tf(anchors, rpn_bbox_pred)
proposals = clip_boxes_tf(proposals, im_info[:2])
# Non-maximal suppression
indices = tf.image.non_max_suppression(proposals, scores, max_output_size=post_nms_topN, iou_threshold=nms_thresh)
boxes = tf.gather(proposals, indices)
boxes = tf.to_float(boxes)
scores = tf.gather(scores, indices)
scores = tf.reshape(scores, shape=(-1, 1))
# Only support single image as input
batch_inds = tf.zeros((tf.shape(indices)[0], 1), dtype=tf.float32)
blob = tf.concat([batch_inds, boxes], 1)
return blob, scores
-
bbox_transform_inv_tf: 每個anchor的邊框?qū)W習(xí)之前得到的偏移量(這里的偏移量就是需要學(xué)習(xí)的rpn_bbox_pred)做位移和縮放墓捻,獲取最終的預(yù)測邊框。也就是將原始proposal A, 通過學(xué)習(xí)rpn_bbox_pred中的參數(shù)闸与,得到一個與ground truth G 相近的預(yù)測邊框 G'毙替。
- clip_boxes_tf: 剪裁掉超出原始圖片邊框的部分。
lib/model/bbox_transform.py
def bbox_transform_inv_tf(boxes, deltas):
boxes = tf.cast(boxes, deltas.dtype)
widths = tf.subtract(boxes[:, 2], boxes[:, 0]) + 1.0
heights = tf.subtract(boxes[:, 3], boxes[:, 1]) + 1.0
ctr_x = tf.add(boxes[:, 0], widths * 0.5)
ctr_y = tf.add(boxes[:, 1], heights * 0.5)
dx = deltas[:, 0]
dy = deltas[:, 1]
dw = deltas[:, 2]
dh = deltas[:, 3]
pred_ctr_x = tf.add(tf.multiply(dx, widths), ctr_x)
pred_ctr_y = tf.add(tf.multiply(dy, heights), ctr_y)
pred_w = tf.multiply(tf.exp(dw), widths)
pred_h = tf.multiply(tf.exp(dh), heights)
pred_boxes0 = tf.subtract(pred_ctr_x, pred_w * 0.5)
pred_boxes1 = tf.subtract(pred_ctr_y, pred_h * 0.5)
pred_boxes2 = tf.add(pred_ctr_x, pred_w * 0.5)
pred_boxes3 = tf.add(pred_ctr_y, pred_h * 0.5)
return tf.stack([pred_boxes0, pred_boxes1, pred_boxes2, pred_boxes3], axis=1)
def clip_boxes_tf(boxes, im_info):
b0 = tf.maximum(tf.minimum(boxes[:, 0], im_info[1] - 1), 0)
b1 = tf.maximum(tf.minimum(boxes[:, 1], im_info[0] - 1), 0)
b2 = tf.maximum(tf.minimum(boxes[:, 2], im_info[1] - 1), 0)
b3 = tf.maximum(tf.minimum(boxes[:, 3], im_info[0] - 1), 0)
return tf.stack([b0, b1, b2, b3], axis=1)
5践樱、rpn_labels = self._anchor_target_layer(rpn_cls_score, "anchor")
這一步要給 self._anchors 中的所有 anchor 賦以一個 label 值:
- 1 : 正樣本
- 0 : 負(fù)樣本
- -1 : 非樣本, 不用于訓(xùn)練
同時初始化一些參數(shù)用于后面計算損失函數(shù)(關(guān)于損失函數(shù), 大家可以去看https://blog.csdn.net/wfei101/article/details/79809332厂画,我覺得講的比論文清楚的多): - rpn_bbox_targets: PRN網(wǎng)絡(luò)邊框回歸的ground truth
- rpn_bbox_inside_weights: label為1的行,也就是目標(biāo)區(qū)域?yàn)榍熬暗男锌叫希瑓?shù)為[1.0 1.0 1.0 1.0]袱院,其余為0
- rpn_bbox_outside_weights: label為0或者1的行,參數(shù)為[1.0 1.0 1.0 1.0] / len(fg+bg)
lib/nets/network.py
def _anchor_target_layer(self, rpn_cls_score, name):
with tf.variable_scope(name) as scope:
rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights = tf.py_func(
anchor_target_layer,
[rpn_cls_score, self._gt_boxes, self._im_info, self._feat_stride, self._anchors, self._num_anchors],
[tf.float32, tf.float32, tf.float32, tf.float32],
name="anchor_target")
rpn_labels.set_shape([1, 1, None, None])
rpn_bbox_targets.set_shape([1, None, None, self._num_anchors * 4])
rpn_bbox_inside_weights.set_shape([1, None, None, self._num_anchors * 4])
rpn_bbox_outside_weights.set_shape([1, None, None, self._num_anchors * 4])
rpn_labels = tf.to_int32(rpn_labels, name="to_int32")
self._anchor_targets['rpn_labels'] = rpn_labels
self._anchor_targets['rpn_bbox_targets'] = rpn_bbox_targets
self._anchor_targets['rpn_bbox_inside_weights'] = rpn_bbox_inside_weights
self._anchor_targets['rpn_bbox_outside_weights'] = rpn_bbox_outside_weights
self._score_summaries.update(self._anchor_targets)
return rpn_labels
lib/layer_utils/anchor_target_layer.py
def anchor_target_layer(rpn_cls_score, gt_boxes, im_info, _feat_stride, all_anchors, num_anchors):
"""Same as the anchor target layer in original Fast/er RCNN """
A = num_anchors
total_anchors = all_anchors.shape[0]
K = total_anchors / num_anchors
# allow boxes to sit over the edge by a small amount
_allowed_border = 0
# map of shape (..., H, W)
height, width = rpn_cls_score.shape[1:3]
# only keep anchors inside the image
inds_inside = np.where(
(all_anchors[:, 0] >= -_allowed_border) &
(all_anchors[:, 1] >= -_allowed_border) &
(all_anchors[:, 2] < im_info[1] + _allowed_border) & # width
(all_anchors[:, 3] < im_info[0] + _allowed_border) # height
)[0]
# keep only inside anchors
anchors = all_anchors[inds_inside, :]
# label: 1 is positive, 0 is negative, -1 is dont care
labels = np.empty((len(inds_inside),), dtype=np.float32)
labels.fill(-1)
# overlaps between the anchors and the gt boxes
# overlaps (ex, gt)
overlaps = bbox_overlaps(
np.ascontiguousarray(anchors, dtype=np.float),
np.ascontiguousarray(gt_boxes, dtype=np.float))
argmax_overlaps = overlaps.argmax(axis=1)
max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps]
gt_argmax_overlaps = overlaps.argmax(axis=0)
gt_max_overlaps = overlaps[gt_argmax_overlaps,
np.arange(overlaps.shape[1])]
gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]
if not cfg.TRAIN.RPN_CLOBBER_POSITIVES:
# assign bg labels first so that positive labels can clobber them
# first set the negatives
labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0
# fg label: for each gt, anchor with highest overlap
labels[gt_argmax_overlaps] = 1
# fg label: above threshold IOU
labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1
if cfg.TRAIN.RPN_CLOBBER_POSITIVES:
# assign bg labels last so that negative labels can clobber positives
labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0
# subsample positive labels if we have too many
num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE)
fg_inds = np.where(labels == 1)[0]
if len(fg_inds) > num_fg:
disable_inds = npr.choice(
fg_inds, size=(len(fg_inds) - num_fg), replace=False)
labels[disable_inds] = -1
# subsample negative labels if we have too many
num_bg = cfg.TRAIN.RPN_BATCHSIZE - np.sum(labels == 1)
bg_inds = np.where(labels == 0)[0]
if len(bg_inds) > num_bg:
disable_inds = npr.choice(
bg_inds, size=(len(bg_inds) - num_bg), replace=False)
labels[disable_inds] = -1
bbox_targets = np.zeros((len(inds_inside), 4), dtype=np.float32)
bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])
bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
# only the positive ones have regression targets
bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS)
bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0:
# uniform weighting of examples (given non-uniform sampling)
num_examples = np.sum(labels >= 0)
positive_weights = np.ones((1, 4)) * 1.0 / num_examples
negative_weights = np.ones((1, 4)) * 1.0 / num_examples
else:
assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) &
(cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1))
positive_weights = (cfg.TRAIN.RPN_POSITIVE_WEIGHT /
np.sum(labels == 1))
negative_weights = ((1.0 - cfg.TRAIN.RPN_POSITIVE_WEIGHT) /
np.sum(labels == 0))
bbox_outside_weights[labels == 1, :] = positive_weights
bbox_outside_weights[labels == 0, :] = negative_weights
# map up to original set of anchors
labels = _unmap(labels, total_anchors, inds_inside, fill=-1)
bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0)
bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0)
bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0)
# labels
labels = labels.reshape((1, height, width, A)).transpose(0, 3, 1, 2)
labels = labels.reshape((1, 1, A * height, width))
rpn_labels = labels
# bbox_targets
bbox_targets = bbox_targets \
.reshape((1, height, width, A * 4))
rpn_bbox_targets = bbox_targets
# bbox_inside_weights
bbox_inside_weights = bbox_inside_weights \
.reshape((1, height, width, A * 4))
rpn_bbox_inside_weights = bbox_inside_weights
# bbox_outside_weights
bbox_outside_weights = bbox_outside_weights \
.reshape((1, height, width, A * 4))
rpn_bbox_outside_weights = bbox_outside_weights
return rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights
def _compute_targets(ex_rois, gt_rois):
"""Compute bounding-box regression targets for an image."""
assert ex_rois.shape[0] == gt_rois.shape[0]
assert ex_rois.shape[1] == 4
assert gt_rois.shape[1] == 5
return bbox_transform(ex_rois, gt_rois[:, :4]).astype(np.float32, copy=False)
lib/model/bbox_transform.py
def bbox_transform(ex_rois, gt_rois):
ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + 1.0
ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.0
ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths
ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights
gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.0
gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0
gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths
gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights
targets_dx = (gt_ctr_x - ex_ctr_x) / ex_widths
targets_dy = (gt_ctr_y - ex_ctr_y) / ex_heights
targets_dw = np.log(gt_widths / ex_widths)
targets_dh = np.log(gt_heights / ex_heights)
targets = np.vstack(
(targets_dx, targets_dy, targets_dw, targets_dh)).transpose()
return targets
6瞭稼、rois, _ = self._proposal_target_layer(rois, roi_scores, "rpn_rois")
從之前RPN給出的2000個proposal中選出__C.TRAIN.BATCH_SIZE(128, 其中25%是前景, 75%是背景)作為訓(xùn)練的一批忽洛。
lib/nets/network.py
def _proposal_target_layer(self, rois, roi_scores, name):
with tf.variable_scope(name) as scope:
rois, roi_scores, labels, bbox_targets, bbox_inside_weights, bbox_outside_weights = tf.py_func(
proposal_target_layer,
[rois, roi_scores, self._gt_boxes, self._num_classes],
[tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32],
name="proposal_target")
rois.set_shape([cfg.TRAIN.BATCH_SIZE, 5])
roi_scores.set_shape([cfg.TRAIN.BATCH_SIZE])
labels.set_shape([cfg.TRAIN.BATCH_SIZE, 1])
bbox_targets.set_shape([cfg.TRAIN.BATCH_SIZE, self._num_classes * 4])
bbox_inside_weights.set_shape([cfg.TRAIN.BATCH_SIZE, self._num_classes * 4])
bbox_outside_weights.set_shape([cfg.TRAIN.BATCH_SIZE, self._num_classes * 4])
self._proposal_targets['rois'] = rois
self._proposal_targets['labels'] = tf.to_int32(labels, name="to_int32")
self._proposal_targets['bbox_targets'] = bbox_targets
self._proposal_targets['bbox_inside_weights'] = bbox_inside_weights
self._proposal_targets['bbox_outside_weights'] = bbox_outside_weights
self._score_summaries.update(self._proposal_targets)
return rois, roi_scores
lib/layer_utils/proposal_target_layer.py
def proposal_target_layer(rpn_rois, rpn_scores, gt_boxes, _num_classes):
"""
Assign object detection proposals to ground-truth targets. Produces proposal
classification labels and bounding-box regression targets.
"""
# Proposal ROIs (0, x1, y1, x2, y2) coming from RPN
# (i.e., rpn.proposal_layer.ProposalLayer), or any other source
all_rois = rpn_rois
all_scores = rpn_scores
# Include ground-truth boxes in the set of candidate rois
if cfg.TRAIN.USE_GT:
zeros = np.zeros((gt_boxes.shape[0], 1), dtype=gt_boxes.dtype)
all_rois = np.vstack(
(all_rois, np.hstack((zeros, gt_boxes[:, :-1])))
)
# not sure if it a wise appending, but anyway i am not using it
all_scores = np.vstack((all_scores, zeros))
num_images = 1
rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images
fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image)
# Sample rois with classification labels and bounding box regression
# targets
labels, rois, roi_scores, bbox_targets, bbox_inside_weights = _sample_rois(
all_rois, all_scores, gt_boxes, fg_rois_per_image,
rois_per_image, _num_classes)
rois = rois.reshape(-1, 5)
roi_scores = roi_scores.reshape(-1)
labels = labels.reshape(-1, 1)
bbox_targets = bbox_targets.reshape(-1, _num_classes * 4)
bbox_inside_weights = bbox_inside_weights.reshape(-1, _num_classes * 4)
bbox_outside_weights = np.array(bbox_inside_weights > 0).astype(np.float32)
return rois, roi_scores, labels, bbox_targets, bbox_inside_weights, bbox_outside_weights
def _get_bbox_regression_labels(bbox_target_data, num_classes):
"""Bounding-box regression targets (bbox_target_data) are stored in a
compact form N x (class, tx, ty, tw, th)
This function expands those targets into the 4-of-4*K representation used
by the network (i.e. only one class has non-zero targets).
Returns:
bbox_target (ndarray): N x 4K blob of regression targets
bbox_inside_weights (ndarray): N x 4K blob of loss weights
"""
clss = bbox_target_data[:, 0]
bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32)
bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32)
inds = np.where(clss > 0)[0]
for ind in inds:
cls = clss[ind]
start = int(4 * cls)
end = start + 4
bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
return bbox_targets, bbox_inside_weights
def _compute_targets(ex_rois, gt_rois, labels):
"""Compute bounding-box regression targets for an image."""
assert ex_rois.shape[0] == gt_rois.shape[0]
assert ex_rois.shape[1] == 4
assert gt_rois.shape[1] == 4
targets = bbox_transform(ex_rois, gt_rois)
if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED:
# Optionally normalize targets by a precomputed mean and stdev
targets = ((targets - np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS))
/ np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS))
return np.hstack(
(labels[:, np.newaxis], targets)).astype(np.float32, copy=False)
def _sample_rois(all_rois, all_scores, gt_boxes, fg_rois_per_image, rois_per_image, num_classes):
"""Generate a random sample of RoIs comprising foreground and background
examples.
"""
# overlaps: (rois x gt_boxes)
overlaps = bbox_overlaps(
np.ascontiguousarray(all_rois[:, 1:5], dtype=np.float),
np.ascontiguousarray(gt_boxes[:, :4], dtype=np.float))
gt_assignment = overlaps.argmax(axis=1)
max_overlaps = overlaps.max(axis=1)
labels = gt_boxes[gt_assignment, 4]
# Select foreground RoIs as those with >= FG_THRESH overlap
fg_inds = np.where(max_overlaps >= cfg.TRAIN.FG_THRESH)[0]
# Guard against the case when an image has fewer than fg_rois_per_image
# Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
bg_inds = np.where((max_overlaps < cfg.TRAIN.BG_THRESH_HI) &
(max_overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]
# Small modification to the original version where we ensure a fixed number of regions are sampled
if fg_inds.size > 0 and bg_inds.size > 0:
fg_rois_per_image = min(fg_rois_per_image, fg_inds.size)
fg_inds = npr.choice(fg_inds, size=int(fg_rois_per_image), replace=False)
bg_rois_per_image = rois_per_image - fg_rois_per_image
to_replace = bg_inds.size < bg_rois_per_image
bg_inds = npr.choice(bg_inds, size=int(bg_rois_per_image), replace=to_replace)
elif fg_inds.size > 0:
to_replace = fg_inds.size < rois_per_image
fg_inds = npr.choice(fg_inds, size=int(rois_per_image), replace=to_replace)
fg_rois_per_image = rois_per_image
elif bg_inds.size > 0:
to_replace = bg_inds.size < rois_per_image
bg_inds = npr.choice(bg_inds, size=int(rois_per_image), replace=to_replace)
fg_rois_per_image = 0
else:
import pdb
pdb.set_trace()
# The indices that we're selecting (both fg and bg)
keep_inds = np.append(fg_inds, bg_inds)
# Select sampled values from various arrays:
labels = labels[keep_inds]
# Clamp labels for the background RoIs to 0
labels[int(fg_rois_per_image):] = 0
rois = all_rois[keep_inds]
roi_scores = all_scores[keep_inds]
bbox_target_data = _compute_targets(
rois[:, 1:5], gt_boxes[gt_assignment[keep_inds], :4], labels)
bbox_targets, bbox_inside_weights = \
_get_bbox_regression_labels(bbox_target_data, num_classes)
return labels, rois, roi_scores, bbox_targets, bbox_inside_weights
ROI層
這一層的輸入是原始圖片經(jīng)過特征網(wǎng)絡(luò)處理過的feature map和rois。
ROI主要做了3件事:
- 因?yàn)閞ois是在原始圖片下的坐標(biāo)环肘,我們第一步需要將rois的坐標(biāo)映射到feature map上
- 將映射后的坐標(biāo)分為大小相等sections
- 對每個section做max pooling
代碼還是很清晰的
lib/nets/network.py
def _crop_pool_layer(self, bottom, rois, name):
with tf.variable_scope(name) as scope:
batch_ids = tf.squeeze(tf.slice(rois, [0, 0], [-1, 1], name="batch_id"), [1])
# Get the normalized coordinates of bounding boxes
bottom_shape = tf.shape(bottom)
height = (tf.to_float(bottom_shape[1]) - 1.) * np.float32(self._feat_stride[0])
width = (tf.to_float(bottom_shape[2]) - 1.) * np.float32(self._feat_stride[0])
x1 = tf.slice(rois, [0, 1], [-1, 1], name="x1") / width
y1 = tf.slice(rois, [0, 2], [-1, 1], name="y1") / height
x2 = tf.slice(rois, [0, 3], [-1, 1], name="x2") / width
y2 = tf.slice(rois, [0, 4], [-1, 1], name="y2") / height
# Won't be back-propagated to rois anyway, but to save time
bboxes = tf.stop_gradient(tf.concat([y1, x1, y2, x2], axis=1))
pre_pool_size = cfg.POOLING_SIZE * 2
crops = tf.image.crop_and_resize(bottom, bboxes, tf.to_int32(batch_ids), [pre_pool_size, pre_pool_size], name="crops")
return slim.max_pool2d(crops, [2, 2], padding='SAME')
bboxes = tf.stop_gradient(tf.concat([y1, x1, y2, x2], axis=1)) 這行代碼主要是為了節(jié)約計算時間欲虚,因?yàn)榉聪騻鞑サ竭@一步就停止了。
crops = tf.image.crop_and_resize(bottom, bboxes, tf.to_int32(batch_ids), [pre_pool_size, pre_pool_size], name="crops") 這行代碼就是將feature map分為14 * 14的section悔雹,主要是為了適配不同size圖片的feature map复哆。
return slim.max_pool2d(crops, [2, 2], padding='SAME') 這行代碼給出了返回結(jié)果,實(shí)現(xiàn)了filter為2 * 2的max pooling, 所以輸出的結(jié)果是一個7 * 7的圖腌零。
構(gòu)建全連接層
fc7 = self._head_to_tail(pool5, is_training)
下面代碼的實(shí)現(xiàn)不用解釋了吧梯找,就是用resnet_v1.resnet_v1實(shí)現(xiàn)了一個FC
lib/nets/resnet_v1.py
def _head_to_tail(self, pool5, is_training, reuse=None):
with slim.arg_scope(resnet_arg_scope(is_training=is_training)):
fc7, _ = resnet_v1.resnet_v1(pool5,
self._blocks[-1:],
global_pool=False,
include_root_block=False,
reuse=reuse,
scope=self._scope)
# average pooling done by reduce_mean
fc7 = tf.reduce_mean(fc7, axis=[1, 2])
return fc7
區(qū)域分類和目標(biāo)邊框的回歸
cls_prob, bbox_pred = self._region_classification(fc7, is_training,
initializer, initializer_bbox)
這段代碼就是整個網(wǎng)絡(luò)的額輸出部分。
lib/nets/network.py
def _region_classification(self, fc7, is_training, initializer, initializer_bbox):
cls_score = slim.fully_connected(fc7, self._num_classes,
weights_initializer=initializer,
trainable=is_training,
activation_fn=None, scope='cls_score')
cls_prob = self._softmax_layer(cls_score, "cls_prob")
cls_pred = tf.argmax(cls_score, axis=1, name="cls_pred")
bbox_pred = slim.fully_connected(fc7, self._num_classes * 4,
weights_initializer=initializer_bbox,
trainable=is_training,
activation_fn=None, scope='bbox_pred')
self._predictions["cls_score"] = cls_score
self._predictions["cls_pred"] = cls_pred
self._predictions["cls_prob"] = cls_prob
self._predictions["bbox_pred"] = bbox_pred
return cls_prob, bbox_pred
到此整個faster rcnn的訓(xùn)練部分已經(jīng)結(jié)束益涧,testing部分的代碼容易一些锈锤,跟訓(xùn)練也差不多,本文就不贅述了。