Faster RCNN源碼解析(2)

接上一篇文章Faster RCNN源碼解析(1).
第二階段我會拆分為個3模塊担忧,在下面詳細(xì)介紹。

RPN

關(guān)于faster rcnn中PRN的介紹大家可以自己看paper或者找點(diǎn)論壇看看, medium, CSDN, 知乎, 包括簡書都有大量的資料做介紹, 本文只站在源碼的角度給你介紹每一步的實(shí)現(xiàn), 所以就不闡述原理了怕轿,見諒~~
代碼入口

lib/model/train_val.py
# Construct the computation graph
    lr, train_op = self.construct_graph(sess)

lr是學(xué)習(xí)率, train_op是訓(xùn)練網(wǎng)絡(luò)的一系列操作。
讓我們走進(jìn)construct_graph函數(shù)

lib/model/train_val.py
  def construct_graph(self, sess):
    with sess.graph.as_default():
      # Set the random seed for tensorflow
      tf.set_random_seed(cfg.RNG_SEED)
      # Build the main computation graph
      layers = self.net.create_architecture('TRAIN', self.imdb.num_classes, tag='default',
                                            anchor_scales=cfg.ANCHOR_SCALES,
                                            anchor_ratios=cfg.ANCHOR_RATIOS)
      # Define the loss
      loss = layers['total_loss']
      # Set learning rate and momentum
      lr = tf.Variable(cfg.TRAIN.LEARNING_RATE, trainable=False)
      self.optimizer = tf.train.MomentumOptimizer(lr, cfg.TRAIN.MOMENTUM)

      # Compute the gradients with regard to the loss
      gvs = self.optimizer.compute_gradients(loss)
      # Double the gradient of the bias if set
      if cfg.TRAIN.DOUBLE_BIAS:
        final_gvs = []
        with tf.variable_scope('Gradient_Mult') as scope:
          for grad, var in gvs:
            scale = 1.
            if cfg.TRAIN.DOUBLE_BIAS and '/biases:' in var.name:
              scale *= 2.
            if not np.allclose(scale, 1.0):
              grad = tf.multiply(grad, scale)
            final_gvs.append((grad, var))
        train_op = self.optimizer.apply_gradients(final_gvs)
      else:
        train_op = self.optimizer.apply_gradients(gvs)

      # We will handle the snapshots ourselves
      self.saver = tf.train.Saver(max_to_keep=100000)
      # Write the train and validation information to tensorboard
      self.writer = tf.summary.FileWriter(self.tbdir, sess.graph)
      self.valwriter = tf.summary.FileWriter(self.tbvaldir)

    return lr, train_op

代碼其實(shí)將流程闡述的非常清楚,我再廢話給大家總結(jié)一下~~

  1. 給tensorflow設(shè)置隨機(jī)種子seed(為啥要這樣凌盯,可以百度一下)
  2. 建立一個計算圖computational graph(重點(diǎn)狸剃,下面介紹)
  3. 定義了一個執(zhí)行Momentum算法的優(yōu)化器

accumulation = momentum * accumulation + gradient
variable -= learning_rate * accumulation

  1. 計算損失參數(shù)的梯度self.optimizer.compute_gradients(loss)
  2. 將梯度應(yīng)用于變量self.optimizer.apply_gradients(gvs), 返回值就是train_op
  3. 定義Saver(用于快照-緩存), writer, valwriter(把信息及時傳入tensorboard)

然后走進(jìn)create_architecture函數(shù)

lib/nets/network.py
  def create_architecture(self, mode, num_classes, tag=None,
                          anchor_scales=(8, 16, 32), anchor_ratios=(0.5, 1, 2)):
    self._image = tf.placeholder(tf.float32, shape=[1, None, None, 3])
    self._im_info = tf.placeholder(tf.float32, shape=[3])
    self._gt_boxes = tf.placeholder(tf.float32, shape=[None, 5])
    self._tag = tag

    self._num_classes = num_classes
    self._mode = mode
    self._anchor_scales = anchor_scales
    self._num_scales = len(anchor_scales)

    self._anchor_ratios = anchor_ratios
    self._num_ratios = len(anchor_ratios)

    self._num_anchors = self._num_scales * self._num_ratios

    training = mode == 'TRAIN'
    testing = mode == 'TEST'

    assert tag != None

    # handle most of the regularizers here
    weights_regularizer = tf.contrib.layers.l2_regularizer(cfg.TRAIN.WEIGHT_DECAY)
    if cfg.TRAIN.BIAS_DECAY:
      biases_regularizer = weights_regularizer
    else:
      biases_regularizer = tf.no_regularizer

    # list as many types of layers as possible, even if they are not used now
    with arg_scope([slim.conv2d, slim.conv2d_in_plane, \
                    slim.conv2d_transpose, slim.separable_conv2d, slim.fully_connected], 
                    weights_regularizer=weights_regularizer,
                    biases_regularizer=biases_regularizer, 
                    biases_initializer=tf.constant_initializer(0.0)): 
      rois, cls_prob, bbox_pred = self._build_network(training)

    layers_to_output = {'rois': rois}

    for var in tf.trainable_variables():
      self._train_summaries.append(var)

    if testing:
      stds = np.tile(np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS), (self._num_classes))
      means = np.tile(np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS), (self._num_classes))
      self._predictions["bbox_pred"] *= stds
      self._predictions["bbox_pred"] += means
    else:
      self._add_losses()
      layers_to_output.update(self._losses)

      val_summaries = []
      with tf.device("/cpu:0"):
        val_summaries.append(self._add_gt_image_summary())
        for key, var in self._event_summaries.items():
          val_summaries.append(tf.summary.scalar(key, var))
        for key, var in self._score_summaries.items():
          self._add_score_summary(key, var)
        for var in self._act_summaries:
          self._add_act_summary(var)
        for var in self._train_summaries:
          self._add_train_summary(var)

      self._summary_op = tf.summary.merge_all()
      self._summary_op_val = tf.summary.merge(val_summaries)

    layers_to_output.update(self._predictions)

    return layers_to_output

很多人(包括我自己)對tensorflow還不是很熟悉掐隐,所以這里還是給大家概括一下程序流程

  1. 給network的成員變量賦值
  2. 定義權(quán)重weights的正則regularizer
  3. 建立網(wǎng)絡(luò)self._build_network(training) (重點(diǎn))
  4. 定義損失函數(shù), 包括RPN class loss, RPN bbox loss,整個RCNN網(wǎng)絡(luò)的class loss和最終確定的物體邊框bbox loss, 細(xì)節(jié)可以看這個函數(shù)_add_losses
  5. 更新一下tensorboard用得到的參數(shù)

然后我們了解一下_build_network函數(shù)

lib/nets/network.py
  def _build_network(self, is_training=True):
    # select initializers
    if cfg.TRAIN.TRUNCATED:
      initializer = tf.truncated_normal_initializer(mean=0.0, stddev=0.01)
      initializer_bbox = tf.truncated_normal_initializer(mean=0.0, stddev=0.001)
    else:
      initializer = tf.random_normal_initializer(mean=0.0, stddev=0.01)
      initializer_bbox = tf.random_normal_initializer(mean=0.0, stddev=0.001)

    net_conv = self._image_to_head(is_training)
    with tf.variable_scope(self._scope, self._scope):
      # build the anchors for the image
      self._anchor_component()
      # region proposal network
      rois = self._region_proposal(net_conv, is_training, initializer)
      # region of interest pooling
      if cfg.POOLING_MODE == 'crop':
        pool5 = self._crop_pool_layer(net_conv, rois, "pool5")
      else:
        raise NotImplementedError

    fc7 = self._head_to_tail(pool5, is_training)
    with tf.variable_scope(self._scope, self._scope):
      # region classification
      cls_prob, bbox_pred = self._region_classification(fc7, is_training, 
                                                        initializer, initializer_bbox)

    self._score_summaries.update(self._predictions)

    return rois, cls_prob, bbox_pred
  1. 初始化權(quán)重weight, 用截斷的normal initializer或者隨機(jī)的normal initializer
  2. 構(gòu)建主干網(wǎng)絡(luò)前端_image_to_head
  3. 構(gòu)建anchors
  4. 構(gòu)建RPN
  5. ROI pooling 調(diào)用函數(shù)_crop_pool_layer
  6. 構(gòu)建主干網(wǎng)絡(luò)的尾部 fc7 = self._head_to_tail(pool5, is_training)
  7. object分類以及邊框預(yù)測的回歸

各位是不是一臉萌幣。钞馁。虑省。不要緊, 下面我會給大家詳細(xì)介紹上述的每一個步驟。

構(gòu)建主干網(wǎng)絡(luò)前端

_image_to_head方法是一個類Network的一個abstract class, 以它的實(shí)現(xiàn)類Resnet 101為例

lib/nets/resnet_v1.py
  def _image_to_head(self, is_training, reuse=None):
    assert (0 <= cfg.RESNET.FIXED_BLOCKS <= 3)
    # Now the base is always fixed during training
    with slim.arg_scope(resnet_arg_scope(is_training=False)):
      net_conv = self._build_base()
    if cfg.RESNET.FIXED_BLOCKS > 0:
      with slim.arg_scope(resnet_arg_scope(is_training=False)):
        net_conv, _ = resnet_v1.resnet_v1(net_conv,
                                           self._blocks[0:cfg.RESNET.FIXED_BLOCKS],
                                           global_pool=False,
                                           include_root_block=False,
                                           reuse=reuse,
                                           scope=self._scope)
    if cfg.RESNET.FIXED_BLOCKS < 3:
      with slim.arg_scope(resnet_arg_scope(is_training=is_training)):
        net_conv, _ = resnet_v1.resnet_v1(net_conv,
                                           self._blocks[cfg.RESNET.FIXED_BLOCKS:-1],
                                           global_pool=False,
                                           include_root_block=False,
                                           reuse=reuse,
                                           scope=self._scope)

    self._act_summaries.append(net_conv)
    self._layers['head'] = net_conv

    return net_conv

  def _build_base(self):
    with tf.variable_scope(self._scope, self._scope):
      net = resnet_utils.conv2d_same(self._image, 64, 7, stride=2, scope='conv1')
      net = tf.pad(net, [[0, 0], [1, 1], [1, 1], [0, 0]])
      net = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID', scope='pool1')

我會在下一篇文章中專門介紹resnet, 這里還是只做一個流程的簡介僧凰。

  1. 調(diào)用_build_base函數(shù)手動建立初始的幾層: input -> 64 * 7 * 7 filters, stride = 2 -> padding -> max pooling
  2. 構(gòu)建網(wǎng)絡(luò)主干, 因?yàn)橹岸x過self._blocks
self._blocks = [resnet_v1_block('block1', base_depth=64, num_units=3, stride=2),
                      resnet_v1_block('block2', base_depth=128, num_units=8, stride=2),
                      # use stride 1 for the last conv4 layer
                      resnet_v1_block('block3', base_depth=256, num_units=36, stride=1),
                      resnet_v1_block('block4', base_depth=512, num_units=3, stride=1)]

調(diào)用slim的resnet_v1的接口實(shí)現(xiàn)這段網(wǎng)絡(luò)resnet_v1.resnet_v1()

構(gòu)建anchors

anchor 是什么探颈?這里借用一點(diǎn)知乎作者馬塔的回答:

anchor的本質(zhì)是什么,本質(zhì)是將相同尺寸的 conv5_3 層的輸出训措,倒推得到不同尺寸的輸入伪节。接下來是anchor的窗口尺寸,詳細(xì)說下這個尺寸的來源绩鸣,最基本的anchor只有一個尺寸怀大,是16*16的尺寸,然后設(shè)定了基本的面積scale是(8呀闻,16化借,32),用這三個scale乘以16就得到了三個面積尺寸(1282捡多,2562蓖康,512^2),然后在每個面積尺寸下垒手,取三種不同的長寬比例(1:1,1:2,2:1).這樣一來蒜焊,我們得到了一共9種面積尺寸各異的anchor。示意圖如下:


不過這個示意圖其實(shí)比較有誤導(dǎo)性淫奔,首先它圖中的9個框并不是在同一個中心點(diǎn)的山涡,而實(shí)際上,是應(yīng)該在每個特征圖的每個點(diǎn)作為中心點(diǎn)生成9 個框唆迁; 其次鸭丛,生成的 anchor 尺寸大小不是以特征圖為基準(zhǔn)的,甚至毫無關(guān)系唐责,而是以 anchor ratio 和 anchor scale 得到最終的大小鳞溉,并且其最大的 anchor 也基本和 resize 之后的圖大小相當(dāng)。
在 generate_anchors 代碼文件中鼠哥,可以看到如下數(shù)據(jù)熟菲,

# anchors =
# -83 -39 100 56
# -175 -87  192 104
# -359 -183 376 200
# -55 -55 72 72
# -119 -119 136 136
# -247 -247 264 264
# -35 -79 52 96
# -79 -167 96 184
# -167 -343 184 360

這就是生成的最基本的9個anchor看政,這個anchor的坐標(biāo)是xyxy類型的,它表示了圖片的左上角的第1個9個anchor的坐標(biāo)抄罕,后面用到的所有anchor都是用它在特征圖上平移得到的(它代表的坐標(biāo)是resize 后的圖片坐標(biāo)而不是原圖)允蚣。
至于這個anchor到底是怎么用的,這個是理解整個問題的關(guān)鍵呆贿。
上面我們已經(jīng)得到了基礎(chǔ)網(wǎng)絡(luò)最終的conv5_3 輸出為138671024(1024是層數(shù))嚷兔,在這個特征參數(shù)的基礎(chǔ)上,通過一個3x3的滑動窗口做入,在這個3867的區(qū)域上進(jìn)行滑動冒晰,stride=1,padding=2竟块,這樣一來壶运,滑動得到的就是3867個3x3的窗口。
對于每個3x3的窗口浪秘,計算這個滑動窗口的中心點(diǎn)所對應(yīng)的原始圖片的中心點(diǎn)蒋情。然后作者假定,這個3x3窗口耸携,是從原始圖片上通過SPP池化得到的恕出,而這個池化的區(qū)域的面積以及比例,就是一個個的anchor违帆。換句話說,對于每個3x3窗口金蜀,作者假定它來自9種不同原始區(qū)域的池化刷后,但是這些池化在原始圖片中的中心點(diǎn),都完全一樣渊抄。這個中心點(diǎn)尝胆,就是剛才提到的,3x3窗口中心點(diǎn)所對應(yīng)的原始圖片中的中心點(diǎn)护桦。如此一來含衔,在每個窗口位置,我們都可以根據(jù)9個不同長寬比例二庵、不同面積的anchor贪染,逆向推導(dǎo)出它所對應(yīng)的原始圖片中的一個區(qū)域,這個區(qū)域的尺寸以及坐標(biāo)催享,都是已知的杭隙。而這個區(qū)域,就是我們想要的 proposal因妙。所以我們通過滑動窗口和anchor痰憎,成功得到了 38
67x9 個原始圖片的proposal票髓。接下來,每個proposal我們只輸出6個參數(shù):每個 proposal 和 ground truth 進(jìn)行比較得到的前景概率和背景概率(2個參數(shù))(對應(yīng) cls_score)铣耘;由于每個 proposal 和 ground truth 位置及尺寸上的差異洽沟,從 proposal 通過平移放縮得到 ground truth 需要的4個平移放縮參數(shù)(對應(yīng) bbox_pred)。
加上一點(diǎn)我的理解蜗细,anchor 是用來做多尺度的目標(biāo)檢測的裆操,它是用來代替圖像金字塔和特征金字塔的,它為什么可以達(dá)到這樣的目的鳄乏?可以看看它的最后一層的輸出是 MN(92)跷车, 如果我們只看它在特征圖 MN 個特征點(diǎn)的第一個點(diǎn)的第一個卷積核,它代表了什么含義橱野?它相當(dāng)于用這個卷積核去綜合圖片該點(diǎn)附近(33朽缴,上一步進(jìn)行了33的卷積)的信息,判斷有沒有第一個尺寸的目標(biāo)水援,也就是說每個卷積核都負(fù)責(zé)了一個尺寸的目標(biāo)檢測密强,那么18個卷積核,每2個負(fù)責(zé)一個任務(wù)蜗元,就達(dá)到了多尺度目標(biāo)檢測的目的损姜,很巧妙的一個思路局义,從最終的效果來看,它實(shí)際上就是一個多尺度的目標(biāo)熱力圖,或者用作者的話說若厚,就相當(dāng)于一個‘注意力’機(jī)制。
另外值得提出的是這里使用的是全卷積結(jié)構(gòu)(33的卷積颜阐,然后接11的卷積)跪解,也就是說 M*N 也是一個二維結(jié)構(gòu),和原圖的像素二維結(jié)構(gòu)是對應(yīng)的楷兽,那么我們就能相應(yīng)的判斷出該特征點(diǎn)對應(yīng)的原圖是否存在目標(biāo)地熄。 個人感覺,理解了這里的全卷積結(jié)構(gòu)和 anchor 的機(jī)制芯杀,整個 faster rcnn 就明晰很多了端考。
最后明確的一點(diǎn)就是在代碼中,anchor揭厚,proposal却特,rois ,boxes 代表的含義其實(shí)都是一樣的棋弥,都是推薦的區(qū)域或者框核偿,不過有所區(qū)別的地方在于這幾個名詞有一個遞進(jìn)的關(guān)系,最開始的是錨定的框 anchor顽染,數(shù)量最多有約20000個(根據(jù)resize后的圖片大小不同而有數(shù)量有所變化)漾岳,然后是RPN網(wǎng)絡(luò)推薦的框 proposal轰绵,數(shù)量較多,train時候有2000個尼荆,最后是實(shí)際分類時候用到的 rois 框左腔,每張圖片有256個;最后得到的結(jié)果就是 boxes捅儒。

好了, 以上就是轉(zhuǎn)自知乎對于anchor的一個詳細(xì)解釋液样,我知道大家是來看代碼的
入口在network.py的 _build_network函數(shù)中

# build the anchors for the image
      self._anchor_component()
      ......
      ......

  def _anchor_component(self):
    with tf.variable_scope('ANCHOR_' + self._tag) as scope:
      # just to get the shape right
      height = tf.to_int32(tf.ceil(self._im_info[0] / np.float32(self._feat_stride[0])))
      width = tf.to_int32(tf.ceil(self._im_info[1] / np.float32(self._feat_stride[0])))
      if cfg.USE_E2E_TF:
        anchors, anchor_length = generate_anchors_pre_tf(
          height,
          width,
          self._feat_stride,
          self._anchor_scales,
          self._anchor_ratios
        )
      else:
        anchors, anchor_length = tf.py_func(generate_anchors_pre,
                                            [height, width,
                                             self._feat_stride, self._anchor_scales, self._anchor_ratios],
                                            [tf.float32, tf.int32], name="generate_anchors")
      anchors.set_shape([None, 4])
      anchor_length.set_shape([])
      self._anchors = anchors
      self._anchor_length = anchor_length
  1. 首先計算好偏移量
  2. 生成初始的9個anchor
lib/layer_utils/snippets.py
def generate_anchors_pre_tf(height, width, feat_stride=16, anchor_scales=(8, 16, 32), anchor_ratios=(0.5, 1, 2)):
  shift_x = tf.range(width) * feat_stride # width
  shift_y = tf.range(height) * feat_stride # height
  shift_x, shift_y = tf.meshgrid(shift_x, shift_y)
  sx = tf.reshape(shift_x, shape=(-1,))
  sy = tf.reshape(shift_y, shape=(-1,))
  shifts = tf.transpose(tf.stack([sx, sy, sx, sy]))
  K = tf.multiply(width, height)
  shifts = tf.transpose(tf.reshape(shifts, shape=[1, K, 4]), perm=(1, 0, 2))

  anchors = generate_anchors(ratios=np.array(anchor_ratios), scales=np.array(anchor_scales))
  A = anchors.shape[0]
  anchor_constant = tf.constant(anchors.reshape((1, A, 4)), dtype=tf.int32)

  length = K * A
  anchors_tf = tf.reshape(tf.add(anchor_constant, shifts), shape=(length, 4))
  
  return tf.cast(anchors_tf, dtype=tf.float32), length

生成anchor的代碼一目了然,這個腳本對系統(tǒng)環(huán)境沒有要求巧还,所以大家也可以直接運(yùn)行該文件鞭莽,打點(diǎn)斷點(diǎn)調(diào)試,就會很清楚整個流程麸祷。

lib/layer_utils/generate_anchors.py
def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
                     scales=2 ** np.arange(3, 6)):
  """
  Generate anchor (reference) windows by enumerating aspect ratios X
  scales wrt a reference (0, 0, 15, 15) window.
  """

  base_anchor = np.array([1, 1, base_size, base_size]) - 1
  ratio_anchors = _ratio_enum(base_anchor, ratios)
  anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
                       for i in range(ratio_anchors.shape[0])])
  return anchors


def _whctrs(anchor):
  """
  Return width, height, x center, and y center for an anchor (window).
  """

  w = anchor[2] - anchor[0] + 1
  h = anchor[3] - anchor[1] + 1
  x_ctr = anchor[0] + 0.5 * (w - 1)
  y_ctr = anchor[1] + 0.5 * (h - 1)
  return w, h, x_ctr, y_ctr


def _mkanchors(ws, hs, x_ctr, y_ctr):
  """
  Given a vector of widths (ws) and heights (hs) around a center
  (x_ctr, y_ctr), output a set of anchors (windows).
  """

  ws = ws[:, np.newaxis]
  hs = hs[:, np.newaxis]
  anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
                       y_ctr - 0.5 * (hs - 1),
                       x_ctr + 0.5 * (ws - 1),
                       y_ctr + 0.5 * (hs - 1)))
  return anchors


def _ratio_enum(anchor, ratios):
  """
  Enumerate a set of anchors for each aspect ratio wrt an anchor.
  """

  w, h, x_ctr, y_ctr = _whctrs(anchor)
  size = w * h
  size_ratios = size / ratios
  ws = np.round(np.sqrt(size_ratios))
  hs = np.round(ws * ratios)
  anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
  return anchors


def _scale_enum(anchor, scales):
  """
  Enumerate a set of anchors for each scale wrt an anchor.
  """

  w, h, x_ctr, y_ctr = _whctrs(anchor)
  ws = w * scales
  hs = h * scales
  anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
  return anchors

構(gòu)建RPN

RPN層利用anchors在圖片上的滑動澎怒,與256(512)個3*3的滑窗做卷積,生成全連接層阶牍。其主要作用有二:

  1. 預(yù)測proposal的中心錨點(diǎn)對應(yīng)的坐標(biāo)x,y以及寬高w,h
  2. 判斷proposal區(qū)域是前景還是背景

代碼入口

# region proposal network
      rois = self._region_proposal(net_conv, is_training, initializer)
lib/nets/network.py
  def _region_proposal(self, net_conv, is_training, initializer):
    rpn = slim.conv2d(net_conv, cfg.RPN_CHANNELS, [3, 3], trainable=is_training, weights_initializer=initializer,
                        scope="rpn_conv/3x3")
    self._act_summaries.append(rpn)
    rpn_cls_score = slim.conv2d(rpn, self._num_anchors * 2, [1, 1], trainable=is_training,
                                weights_initializer=initializer,
                                padding='VALID', activation_fn=None, scope='rpn_cls_score')
    # change it so that the score has 2 as its channel size
    rpn_cls_score_reshape = self._reshape_layer(rpn_cls_score, 2, 'rpn_cls_score_reshape')
    rpn_cls_prob_reshape = self._softmax_layer(rpn_cls_score_reshape, "rpn_cls_prob_reshape")
    rpn_cls_pred = tf.argmax(tf.reshape(rpn_cls_score_reshape, [-1, 2]), axis=1, name="rpn_cls_pred")
    rpn_cls_prob = self._reshape_layer(rpn_cls_prob_reshape, self._num_anchors * 2, "rpn_cls_prob")
    rpn_bbox_pred = slim.conv2d(rpn, self._num_anchors * 4, [1, 1], trainable=is_training,
                                weights_initializer=initializer,
                                padding='VALID', activation_fn=None, scope='rpn_bbox_pred')
    if is_training:
      rois, roi_scores = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
      rpn_labels = self._anchor_target_layer(rpn_cls_score, "anchor")
      # Try to have a deterministic order for the computing graph, for reproducibility
      with tf.control_dependencies([rpn_labels]):
        rois, _ = self._proposal_target_layer(rois, roi_scores, "rpn_rois")
    else:
      if cfg.TEST.MODE == 'nms':
        rois, _ = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
      elif cfg.TEST.MODE == 'top':
        rois, _ = self._proposal_top_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
      else:
        raise NotImplementedError

    self._predictions["rpn_cls_score"] = rpn_cls_score
    self._predictions["rpn_cls_score_reshape"] = rpn_cls_score_reshape
    self._predictions["rpn_cls_prob"] = rpn_cls_prob
    self._predictions["rpn_cls_pred"] = rpn_cls_pred
    self._predictions["rpn_bbox_pred"] = rpn_bbox_pred
    self._predictions["rois"] = rois

    return rois

_region_proposal這段代碼邏輯看細(xì)節(jié)的話會有點(diǎn)繞喷面,所以先回顧一下整個faster rcnn的流程圖


_region_proposal 函數(shù)分別做了以下幾件事:

  1. 將特征圖[60 * 40 * 256] (取決于原始圖片的像素還有縮放scale還有選擇的特征提取網(wǎng)絡(luò)) 與 256個 3*3 的flters卷積(近一步提取特征)得到[60 * 40 * 9]的圖
  2. 與18個11的filters做卷積,也就是92走孽,對應(yīng)著每個像素9個anchor惧辈,乘以2表示每個anchor對應(yīng)2個scores,分別表示前景或者背景磕瓷。通過reshape -> softmax -> reshape 獲取了目標(biāo)是否是物體的預(yù)測以及得分盒齿。輸出的參數(shù)有兩個rpn_cls_pred(預(yù)測結(jié)果),rpn_cls_prob(前景和背景的概率)困食。需要監(jiān)督的信息是Y=0,1县昂,表示這個區(qū)域是否是ground truth。
  3. 與36個1*1的filters做卷積陷舅,也就是 9 * 4, 得到Anchor Box的坐標(biāo)信息审洞,其實(shí)是偏移量

ground truth:標(biāo)定的框也對應(yīng)一個中心點(diǎn)位置坐標(biāo)x,y和寬高w,h
anchor box: 中心點(diǎn)位置坐標(biāo)x_a,y_a和寬高w_a,h_a
所以莱睁,偏移量:
△x=(x-x_a)/w_a △y=(y-y_a)/h_a
△w=log(w/w_a) △h=log(h/h_a)
通過ground truth box與預(yù)測的anchor box之間的差異來進(jìn)行學(xué)習(xí),從而是RPN網(wǎng)絡(luò)中的權(quán)重能夠?qū)W習(xí)到預(yù)測box的能力

  1. rois, roi_scores = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois"), 這一步獲取2000個proposals
  2. rpn_labels = self._anchor_target_layer(rpn_cls_score, "anchor")芒澜,計算每個anchor的label值
  3. rois, _ = self._proposal_target_layer(rois, roi_scores, "rpn_rois")仰剿,選取__C.TRAIN.BATCH_SIZE個正樣本和負(fù)樣本作為訓(xùn)練的一個mini batch。這一步的作用是給PRN提供的proposals分配標(biāo)簽痴晦,計算proposals和ground truth boxes的偏移量南吮,用于網(wǎng)絡(luò)最后一層(bbox_pred)回歸參數(shù)的學(xué)習(xí)。

接下來詳細(xì)為大家介紹一下步驟4誊酌,5部凑,6

4露乏、rois, roi_scores = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois")

lib/nets/network.py
  def _proposal_layer(self, rpn_cls_prob, rpn_bbox_pred, name):
    with tf.variable_scope(name) as scope:
      if cfg.USE_E2E_TF:
        rois, rpn_scores = proposal_layer_tf(
          rpn_cls_prob,
          rpn_bbox_pred,
          self._im_info,
          self._mode,
          self._feat_stride,
          self._anchors,
          self._num_anchors
        )
      else:
        rois, rpn_scores = tf.py_func(proposal_layer,
                              [rpn_cls_prob, rpn_bbox_pred, self._im_info, self._mode,
                               self._feat_stride, self._anchors, self._num_anchors],
                              [tf.float32, tf.float32], name="proposal")

      rois.set_shape([None, 5])
      rpn_scores.set_shape([None, 1])

    return rois, rpn_scores

proposal_layer_tf 的流程如下

  1. 從config文件讀取配置 post_nms_topN(執(zhí)行NMS算法后proposal的數(shù)量), nms_thresh(NMS 閾值)
  2. 原始anchor給出的proposal通過學(xué)習(xí)參數(shù)rpn_bbox_pred涂邀,轉(zhuǎn)換為與ground truth接近的邊框瘟仿,裁剪掉超出圖片的部分
  3. 執(zhí)行NMS算法,獲取最終的proposals

用下圖一個案例來對NMS算法進(jìn)行簡單介紹



如上圖所示比勉,一共有6個識別為人的框劳较,每一個框有一個置信率。
現(xiàn)在需要消除多余的:

  • 按置信率排序: 0.95, 0.9, 0.9, 0.8, 0.7, 0.7
  • 取最大0.95的框?yàn)橐粋€物體框
  • 剩余5個框中浩聋,去掉與0.95框重疊率IoU大于0.6(可以另行設(shè)置)观蜗,則保留0.9, 0.8, 0.7三個框
  • 重復(fù)上面的步驟,直到?jīng)]有框了衣洁,0.9為一個框

選出來的為: 0.95, 0.9

lib/layer_utils/proposal_layer.py
def proposal_layer_tf(rpn_cls_prob, rpn_bbox_pred, im_info, cfg_key, _feat_stride, anchors, num_anchors):
  if type(cfg_key) == bytes:
    cfg_key = cfg_key.decode('utf-8')
  pre_nms_topN = cfg[cfg_key].RPN_PRE_NMS_TOP_N
  post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N
  nms_thresh = cfg[cfg_key].RPN_NMS_THRESH

  # Get the scores and bounding boxes
  scores = rpn_cls_prob[:, :, :, num_anchors:]
  scores = tf.reshape(scores, shape=(-1,))
  rpn_bbox_pred = tf.reshape(rpn_bbox_pred, shape=(-1, 4))

  proposals = bbox_transform_inv_tf(anchors, rpn_bbox_pred)
  proposals = clip_boxes_tf(proposals, im_info[:2])

  # Non-maximal suppression
  indices = tf.image.non_max_suppression(proposals, scores, max_output_size=post_nms_topN, iou_threshold=nms_thresh)

  boxes = tf.gather(proposals, indices)
  boxes = tf.to_float(boxes)
  scores = tf.gather(scores, indices)
  scores = tf.reshape(scores, shape=(-1, 1))

  # Only support single image as input
  batch_inds = tf.zeros((tf.shape(indices)[0], 1), dtype=tf.float32)
  blob = tf.concat([batch_inds, boxes], 1)

  return blob, scores
  • bbox_transform_inv_tf: 每個anchor的邊框?qū)W習(xí)之前得到的偏移量(這里的偏移量就是需要學(xué)習(xí)的rpn_bbox_pred)做位移和縮放墓捻,獲取最終的預(yù)測邊框。也就是將原始proposal A, 通過學(xué)習(xí)rpn_bbox_pred中的參數(shù)闸与,得到一個與ground truth G 相近的預(yù)測邊框 G'毙替。


  • clip_boxes_tf: 剪裁掉超出原始圖片邊框的部分。
lib/model/bbox_transform.py
def bbox_transform_inv_tf(boxes, deltas):
  boxes = tf.cast(boxes, deltas.dtype)
  widths = tf.subtract(boxes[:, 2], boxes[:, 0]) + 1.0
  heights = tf.subtract(boxes[:, 3], boxes[:, 1]) + 1.0
  ctr_x = tf.add(boxes[:, 0], widths * 0.5)
  ctr_y = tf.add(boxes[:, 1], heights * 0.5)

  dx = deltas[:, 0]
  dy = deltas[:, 1]
  dw = deltas[:, 2]
  dh = deltas[:, 3]

  pred_ctr_x = tf.add(tf.multiply(dx, widths), ctr_x)
  pred_ctr_y = tf.add(tf.multiply(dy, heights), ctr_y)
  pred_w = tf.multiply(tf.exp(dw), widths)
  pred_h = tf.multiply(tf.exp(dh), heights)

  pred_boxes0 = tf.subtract(pred_ctr_x, pred_w * 0.5)
  pred_boxes1 = tf.subtract(pred_ctr_y, pred_h * 0.5)
  pred_boxes2 = tf.add(pred_ctr_x, pred_w * 0.5)
  pred_boxes3 = tf.add(pred_ctr_y, pred_h * 0.5)

  return tf.stack([pred_boxes0, pred_boxes1, pred_boxes2, pred_boxes3], axis=1)

def clip_boxes_tf(boxes, im_info):
  b0 = tf.maximum(tf.minimum(boxes[:, 0], im_info[1] - 1), 0)
  b1 = tf.maximum(tf.minimum(boxes[:, 1], im_info[0] - 1), 0)
  b2 = tf.maximum(tf.minimum(boxes[:, 2], im_info[1] - 1), 0)
  b3 = tf.maximum(tf.minimum(boxes[:, 3], im_info[0] - 1), 0)
  return tf.stack([b0, b1, b2, b3], axis=1)

5践樱、rpn_labels = self._anchor_target_layer(rpn_cls_score, "anchor")

這一步要給 self._anchors 中的所有 anchor 賦以一個 label 值:

  • 1 : 正樣本
  • 0 : 負(fù)樣本
  • -1 : 非樣本, 不用于訓(xùn)練
    同時初始化一些參數(shù)用于后面計算損失函數(shù)(關(guān)于損失函數(shù), 大家可以去看https://blog.csdn.net/wfei101/article/details/79809332厂画,我覺得講的比論文清楚的多):
  • rpn_bbox_targets: PRN網(wǎng)絡(luò)邊框回歸的ground truth
  • rpn_bbox_inside_weights: label為1的行,也就是目標(biāo)區(qū)域?yàn)榍熬暗男锌叫希瑓?shù)為[1.0 1.0 1.0 1.0]袱院,其余為0
  • rpn_bbox_outside_weights: label為0或者1的行,參數(shù)為[1.0 1.0 1.0 1.0] / len(fg+bg)
lib/nets/network.py
  def _anchor_target_layer(self, rpn_cls_score, name):
    with tf.variable_scope(name) as scope:
      rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights = tf.py_func(
        anchor_target_layer,
        [rpn_cls_score, self._gt_boxes, self._im_info, self._feat_stride, self._anchors, self._num_anchors],
        [tf.float32, tf.float32, tf.float32, tf.float32],
        name="anchor_target")

      rpn_labels.set_shape([1, 1, None, None])
      rpn_bbox_targets.set_shape([1, None, None, self._num_anchors * 4])
      rpn_bbox_inside_weights.set_shape([1, None, None, self._num_anchors * 4])
      rpn_bbox_outside_weights.set_shape([1, None, None, self._num_anchors * 4])

      rpn_labels = tf.to_int32(rpn_labels, name="to_int32")
      self._anchor_targets['rpn_labels'] = rpn_labels
      self._anchor_targets['rpn_bbox_targets'] = rpn_bbox_targets
      self._anchor_targets['rpn_bbox_inside_weights'] = rpn_bbox_inside_weights
      self._anchor_targets['rpn_bbox_outside_weights'] = rpn_bbox_outside_weights

      self._score_summaries.update(self._anchor_targets)

    return rpn_labels
lib/layer_utils/anchor_target_layer.py
def anchor_target_layer(rpn_cls_score, gt_boxes, im_info, _feat_stride, all_anchors, num_anchors):
  """Same as the anchor target layer in original Fast/er RCNN """
  A = num_anchors
  total_anchors = all_anchors.shape[0]
  K = total_anchors / num_anchors

  # allow boxes to sit over the edge by a small amount
  _allowed_border = 0

  # map of shape (..., H, W)
  height, width = rpn_cls_score.shape[1:3]

  # only keep anchors inside the image
  inds_inside = np.where(
    (all_anchors[:, 0] >= -_allowed_border) &
    (all_anchors[:, 1] >= -_allowed_border) &
    (all_anchors[:, 2] < im_info[1] + _allowed_border) &  # width
    (all_anchors[:, 3] < im_info[0] + _allowed_border)  # height
  )[0]

  # keep only inside anchors
  anchors = all_anchors[inds_inside, :]

  # label: 1 is positive, 0 is negative, -1 is dont care
  labels = np.empty((len(inds_inside),), dtype=np.float32)
  labels.fill(-1)

  # overlaps between the anchors and the gt boxes
  # overlaps (ex, gt)
  overlaps = bbox_overlaps(
    np.ascontiguousarray(anchors, dtype=np.float),
    np.ascontiguousarray(gt_boxes, dtype=np.float))
  argmax_overlaps = overlaps.argmax(axis=1)
  max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps]
  gt_argmax_overlaps = overlaps.argmax(axis=0)
  gt_max_overlaps = overlaps[gt_argmax_overlaps,
                             np.arange(overlaps.shape[1])]
  gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]

  if not cfg.TRAIN.RPN_CLOBBER_POSITIVES:
    # assign bg labels first so that positive labels can clobber them
    # first set the negatives
    labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0

  # fg label: for each gt, anchor with highest overlap
  labels[gt_argmax_overlaps] = 1

  # fg label: above threshold IOU
  labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1

  if cfg.TRAIN.RPN_CLOBBER_POSITIVES:
    # assign bg labels last so that negative labels can clobber positives
    labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0

  # subsample positive labels if we have too many
  num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE)
  fg_inds = np.where(labels == 1)[0]
  if len(fg_inds) > num_fg:
    disable_inds = npr.choice(
      fg_inds, size=(len(fg_inds) - num_fg), replace=False)
    labels[disable_inds] = -1

  # subsample negative labels if we have too many
  num_bg = cfg.TRAIN.RPN_BATCHSIZE - np.sum(labels == 1)
  bg_inds = np.where(labels == 0)[0]
  if len(bg_inds) > num_bg:
    disable_inds = npr.choice(
      bg_inds, size=(len(bg_inds) - num_bg), replace=False)
    labels[disable_inds] = -1

  bbox_targets = np.zeros((len(inds_inside), 4), dtype=np.float32)
  bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])

  bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
  # only the positive ones have regression targets
  bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS)

  bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
  if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0:
    # uniform weighting of examples (given non-uniform sampling)
    num_examples = np.sum(labels >= 0)
    positive_weights = np.ones((1, 4)) * 1.0 / num_examples
    negative_weights = np.ones((1, 4)) * 1.0 / num_examples
  else:
    assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) &
            (cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1))
    positive_weights = (cfg.TRAIN.RPN_POSITIVE_WEIGHT /
                        np.sum(labels == 1))
    negative_weights = ((1.0 - cfg.TRAIN.RPN_POSITIVE_WEIGHT) /
                        np.sum(labels == 0))
  bbox_outside_weights[labels == 1, :] = positive_weights
  bbox_outside_weights[labels == 0, :] = negative_weights

  # map up to original set of anchors
  labels = _unmap(labels, total_anchors, inds_inside, fill=-1)
  bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0)
  bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0)
  bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0)

  # labels
  labels = labels.reshape((1, height, width, A)).transpose(0, 3, 1, 2)
  labels = labels.reshape((1, 1, A * height, width))
  rpn_labels = labels

  # bbox_targets
  bbox_targets = bbox_targets \
    .reshape((1, height, width, A * 4))

  rpn_bbox_targets = bbox_targets
  # bbox_inside_weights
  bbox_inside_weights = bbox_inside_weights \
    .reshape((1, height, width, A * 4))

  rpn_bbox_inside_weights = bbox_inside_weights

  # bbox_outside_weights
  bbox_outside_weights = bbox_outside_weights \
    .reshape((1, height, width, A * 4))

  rpn_bbox_outside_weights = bbox_outside_weights
  return rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights

def _compute_targets(ex_rois, gt_rois):
  """Compute bounding-box regression targets for an image."""

  assert ex_rois.shape[0] == gt_rois.shape[0]
  assert ex_rois.shape[1] == 4
  assert gt_rois.shape[1] == 5

  return bbox_transform(ex_rois, gt_rois[:, :4]).astype(np.float32, copy=False)
lib/model/bbox_transform.py
def bbox_transform(ex_rois, gt_rois):
  ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + 1.0
  ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.0
  ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths
  ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights

  gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.0
  gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0
  gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths
  gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights

  targets_dx = (gt_ctr_x - ex_ctr_x) / ex_widths
  targets_dy = (gt_ctr_y - ex_ctr_y) / ex_heights
  targets_dw = np.log(gt_widths / ex_widths)
  targets_dh = np.log(gt_heights / ex_heights)

  targets = np.vstack(
    (targets_dx, targets_dy, targets_dw, targets_dh)).transpose()
  return targets

6瞭稼、rois, _ = self._proposal_target_layer(rois, roi_scores, "rpn_rois")

從之前RPN給出的2000個proposal中選出__C.TRAIN.BATCH_SIZE(128, 其中25%是前景, 75%是背景)作為訓(xùn)練的一批忽洛。

lib/nets/network.py
  def _proposal_target_layer(self, rois, roi_scores, name):
    with tf.variable_scope(name) as scope:
      rois, roi_scores, labels, bbox_targets, bbox_inside_weights, bbox_outside_weights = tf.py_func(
        proposal_target_layer,
        [rois, roi_scores, self._gt_boxes, self._num_classes],
        [tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32],
        name="proposal_target")

      rois.set_shape([cfg.TRAIN.BATCH_SIZE, 5])
      roi_scores.set_shape([cfg.TRAIN.BATCH_SIZE])
      labels.set_shape([cfg.TRAIN.BATCH_SIZE, 1])
      bbox_targets.set_shape([cfg.TRAIN.BATCH_SIZE, self._num_classes * 4])
      bbox_inside_weights.set_shape([cfg.TRAIN.BATCH_SIZE, self._num_classes * 4])
      bbox_outside_weights.set_shape([cfg.TRAIN.BATCH_SIZE, self._num_classes * 4])

      self._proposal_targets['rois'] = rois
      self._proposal_targets['labels'] = tf.to_int32(labels, name="to_int32")
      self._proposal_targets['bbox_targets'] = bbox_targets
      self._proposal_targets['bbox_inside_weights'] = bbox_inside_weights
      self._proposal_targets['bbox_outside_weights'] = bbox_outside_weights

      self._score_summaries.update(self._proposal_targets)

      return rois, roi_scores
lib/layer_utils/proposal_target_layer.py
def proposal_target_layer(rpn_rois, rpn_scores, gt_boxes, _num_classes):
  """
  Assign object detection proposals to ground-truth targets. Produces proposal
  classification labels and bounding-box regression targets.
  """

  # Proposal ROIs (0, x1, y1, x2, y2) coming from RPN
  # (i.e., rpn.proposal_layer.ProposalLayer), or any other source
  all_rois = rpn_rois
  all_scores = rpn_scores

  # Include ground-truth boxes in the set of candidate rois
  if cfg.TRAIN.USE_GT:
    zeros = np.zeros((gt_boxes.shape[0], 1), dtype=gt_boxes.dtype)
    all_rois = np.vstack(
      (all_rois, np.hstack((zeros, gt_boxes[:, :-1])))
    )
    # not sure if it a wise appending, but anyway i am not using it
    all_scores = np.vstack((all_scores, zeros))

  num_images = 1
  rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images
  fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image)

  # Sample rois with classification labels and bounding box regression
  # targets
  labels, rois, roi_scores, bbox_targets, bbox_inside_weights = _sample_rois(
    all_rois, all_scores, gt_boxes, fg_rois_per_image,
    rois_per_image, _num_classes)

  rois = rois.reshape(-1, 5)
  roi_scores = roi_scores.reshape(-1)
  labels = labels.reshape(-1, 1)
  bbox_targets = bbox_targets.reshape(-1, _num_classes * 4)
  bbox_inside_weights = bbox_inside_weights.reshape(-1, _num_classes * 4)
  bbox_outside_weights = np.array(bbox_inside_weights > 0).astype(np.float32)

  return rois, roi_scores, labels, bbox_targets, bbox_inside_weights, bbox_outside_weights


def _get_bbox_regression_labels(bbox_target_data, num_classes):
  """Bounding-box regression targets (bbox_target_data) are stored in a
  compact form N x (class, tx, ty, tw, th)

  This function expands those targets into the 4-of-4*K representation used
  by the network (i.e. only one class has non-zero targets).

  Returns:
      bbox_target (ndarray): N x 4K blob of regression targets
      bbox_inside_weights (ndarray): N x 4K blob of loss weights
  """

  clss = bbox_target_data[:, 0]
  bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32)
  bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32)
  inds = np.where(clss > 0)[0]
  for ind in inds:
    cls = clss[ind]
    start = int(4 * cls)
    end = start + 4
    bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
    bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
  return bbox_targets, bbox_inside_weights


def _compute_targets(ex_rois, gt_rois, labels):
  """Compute bounding-box regression targets for an image."""

  assert ex_rois.shape[0] == gt_rois.shape[0]
  assert ex_rois.shape[1] == 4
  assert gt_rois.shape[1] == 4

  targets = bbox_transform(ex_rois, gt_rois)
  if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED:
    # Optionally normalize targets by a precomputed mean and stdev
    targets = ((targets - np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS))
               / np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS))
  return np.hstack(
    (labels[:, np.newaxis], targets)).astype(np.float32, copy=False)


def _sample_rois(all_rois, all_scores, gt_boxes, fg_rois_per_image, rois_per_image, num_classes):
  """Generate a random sample of RoIs comprising foreground and background
  examples.
  """
  # overlaps: (rois x gt_boxes)
  overlaps = bbox_overlaps(
    np.ascontiguousarray(all_rois[:, 1:5], dtype=np.float),
    np.ascontiguousarray(gt_boxes[:, :4], dtype=np.float))
  gt_assignment = overlaps.argmax(axis=1)
  max_overlaps = overlaps.max(axis=1)
  labels = gt_boxes[gt_assignment, 4]

  # Select foreground RoIs as those with >= FG_THRESH overlap
  fg_inds = np.where(max_overlaps >= cfg.TRAIN.FG_THRESH)[0]
  # Guard against the case when an image has fewer than fg_rois_per_image
  # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
  bg_inds = np.where((max_overlaps < cfg.TRAIN.BG_THRESH_HI) &
                     (max_overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]

  # Small modification to the original version where we ensure a fixed number of regions are sampled
  if fg_inds.size > 0 and bg_inds.size > 0:
    fg_rois_per_image = min(fg_rois_per_image, fg_inds.size)
    fg_inds = npr.choice(fg_inds, size=int(fg_rois_per_image), replace=False)
    bg_rois_per_image = rois_per_image - fg_rois_per_image
    to_replace = bg_inds.size < bg_rois_per_image
    bg_inds = npr.choice(bg_inds, size=int(bg_rois_per_image), replace=to_replace)
  elif fg_inds.size > 0:
    to_replace = fg_inds.size < rois_per_image
    fg_inds = npr.choice(fg_inds, size=int(rois_per_image), replace=to_replace)
    fg_rois_per_image = rois_per_image
  elif bg_inds.size > 0:
    to_replace = bg_inds.size < rois_per_image
    bg_inds = npr.choice(bg_inds, size=int(rois_per_image), replace=to_replace)
    fg_rois_per_image = 0
  else:
    import pdb
    pdb.set_trace()

  # The indices that we're selecting (both fg and bg)
  keep_inds = np.append(fg_inds, bg_inds)
  # Select sampled values from various arrays:
  labels = labels[keep_inds]
  # Clamp labels for the background RoIs to 0
  labels[int(fg_rois_per_image):] = 0
  rois = all_rois[keep_inds]
  roi_scores = all_scores[keep_inds]

  bbox_target_data = _compute_targets(
    rois[:, 1:5], gt_boxes[gt_assignment[keep_inds], :4], labels)

  bbox_targets, bbox_inside_weights = \
    _get_bbox_regression_labels(bbox_target_data, num_classes)

  return labels, rois, roi_scores, bbox_targets, bbox_inside_weights

ROI層

這一層的輸入是原始圖片經(jīng)過特征網(wǎng)絡(luò)處理過的feature map和rois。
ROI主要做了3件事:

  1. 因?yàn)閞ois是在原始圖片下的坐標(biāo)环肘,我們第一步需要將rois的坐標(biāo)映射到feature map上
  2. 將映射后的坐標(biāo)分為大小相等sections
  3. 對每個section做max pooling
    代碼還是很清晰的
lib/nets/network.py
  def _crop_pool_layer(self, bottom, rois, name):
    with tf.variable_scope(name) as scope:
      batch_ids = tf.squeeze(tf.slice(rois, [0, 0], [-1, 1], name="batch_id"), [1])
      # Get the normalized coordinates of bounding boxes
      bottom_shape = tf.shape(bottom)
      height = (tf.to_float(bottom_shape[1]) - 1.) * np.float32(self._feat_stride[0])
      width = (tf.to_float(bottom_shape[2]) - 1.) * np.float32(self._feat_stride[0])
      x1 = tf.slice(rois, [0, 1], [-1, 1], name="x1") / width
      y1 = tf.slice(rois, [0, 2], [-1, 1], name="y1") / height
      x2 = tf.slice(rois, [0, 3], [-1, 1], name="x2") / width
      y2 = tf.slice(rois, [0, 4], [-1, 1], name="y2") / height
      # Won't be back-propagated to rois anyway, but to save time
      bboxes = tf.stop_gradient(tf.concat([y1, x1, y2, x2], axis=1))
      pre_pool_size = cfg.POOLING_SIZE * 2
      crops = tf.image.crop_and_resize(bottom, bboxes, tf.to_int32(batch_ids), [pre_pool_size, pre_pool_size], name="crops")

    return slim.max_pool2d(crops, [2, 2], padding='SAME')

bboxes = tf.stop_gradient(tf.concat([y1, x1, y2, x2], axis=1)) 這行代碼主要是為了節(jié)約計算時間欲虚,因?yàn)榉聪騻鞑サ竭@一步就停止了。
crops = tf.image.crop_and_resize(bottom, bboxes, tf.to_int32(batch_ids), [pre_pool_size, pre_pool_size], name="crops") 這行代碼就是將feature map分為14 * 14的section悔雹,主要是為了適配不同size圖片的feature map复哆。
return slim.max_pool2d(crops, [2, 2], padding='SAME') 這行代碼給出了返回結(jié)果,實(shí)現(xiàn)了filter為2 * 2的max pooling, 所以輸出的結(jié)果是一個7 * 7的圖腌零。

構(gòu)建全連接層

fc7 = self._head_to_tail(pool5, is_training)
下面代碼的實(shí)現(xiàn)不用解釋了吧梯找,就是用resnet_v1.resnet_v1實(shí)現(xiàn)了一個FC

lib/nets/resnet_v1.py
  def _head_to_tail(self, pool5, is_training, reuse=None):
    with slim.arg_scope(resnet_arg_scope(is_training=is_training)):
      fc7, _ = resnet_v1.resnet_v1(pool5,
                                   self._blocks[-1:],
                                   global_pool=False,
                                   include_root_block=False,
                                   reuse=reuse,
                                   scope=self._scope)
      # average pooling done by reduce_mean
      fc7 = tf.reduce_mean(fc7, axis=[1, 2])
    return fc7

區(qū)域分類和目標(biāo)邊框的回歸

cls_prob, bbox_pred = self._region_classification(fc7, is_training,
initializer, initializer_bbox)
這段代碼就是整個網(wǎng)絡(luò)的額輸出部分。

lib/nets/network.py
  def _region_classification(self, fc7, is_training, initializer, initializer_bbox):
    cls_score = slim.fully_connected(fc7, self._num_classes, 
                                       weights_initializer=initializer,
                                       trainable=is_training,
                                       activation_fn=None, scope='cls_score')
    cls_prob = self._softmax_layer(cls_score, "cls_prob")
    cls_pred = tf.argmax(cls_score, axis=1, name="cls_pred")
    bbox_pred = slim.fully_connected(fc7, self._num_classes * 4, 
                                     weights_initializer=initializer_bbox,
                                     trainable=is_training,
                                     activation_fn=None, scope='bbox_pred')

    self._predictions["cls_score"] = cls_score
    self._predictions["cls_pred"] = cls_pred
    self._predictions["cls_prob"] = cls_prob
    self._predictions["bbox_pred"] = bbox_pred

    return cls_prob, bbox_pred

到此整個faster rcnn的訓(xùn)練部分已經(jīng)結(jié)束益涧,testing部分的代碼容易一些锈锤,跟訓(xùn)練也差不多,本文就不贅述了。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末久免,一起剝皮案震驚了整個濱河市浅辙,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌妄壶,老刑警劉巖摔握,帶你破解...
    沈念sama閱讀 217,542評論 6 504
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異丁寄,居然都是意外死亡氨淌,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,822評論 3 394
  • 文/潘曉璐 我一進(jìn)店門伊磺,熙熙樓的掌柜王于貴愁眉苦臉地迎上來盛正,“玉大人,你說我怎么就攤上這事屑埋『荔荩” “怎么了?”我有些...
    開封第一講書人閱讀 163,912評論 0 354
  • 文/不壞的土叔 我叫張陵摘能,是天一觀的道長续崖。 經(jīng)常有香客問我,道長团搞,這世上最難降的妖魔是什么严望? 我笑而不...
    開封第一講書人閱讀 58,449評論 1 293
  • 正文 為了忘掉前任,我火速辦了婚禮逻恐,結(jié)果婚禮上像吻,老公的妹妹穿的比我還像新娘。我一直安慰自己复隆,他們只是感情好拨匆,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,500評論 6 392
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著挽拂,像睡著了一般惭每。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上亏栈,一...
    開封第一講書人閱讀 51,370評論 1 302
  • 那天洪鸭,我揣著相機(jī)與錄音,去河邊找鬼仑扑。 笑死,一個胖子當(dāng)著我的面吹牛置鼻,可吹牛的內(nèi)容都是我干的镇饮。 我是一名探鬼主播,決...
    沈念sama閱讀 40,193評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼箕母,長吁一口氣:“原來是場噩夢啊……” “哼储藐!你這毒婦竟也來了俱济?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 39,074評論 0 276
  • 序言:老撾萬榮一對情侶失蹤钙勃,失蹤者是張志新(化名)和其女友劉穎蛛碌,沒想到半個月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體辖源,經(jīng)...
    沈念sama閱讀 45,505評論 1 314
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡蔚携,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,722評論 3 335
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發(fā)現(xiàn)自己被綠了克饶。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片酝蜒。...
    茶點(diǎn)故事閱讀 39,841評論 1 348
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖矾湃,靈堂內(nèi)的尸體忽然破棺而出亡脑,到底是詐尸還是另有隱情,我是刑警寧澤邀跃,帶...
    沈念sama閱讀 35,569評論 5 345
  • 正文 年R本政府宣布霉咨,位于F島的核電站,受9級特大地震影響拍屑,放射性物質(zhì)發(fā)生泄漏途戒。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,168評論 3 328
  • 文/蒙蒙 一丽涩、第九天 我趴在偏房一處隱蔽的房頂上張望棺滞。 院中可真熱鬧,春花似錦矢渊、人聲如沸继准。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,783評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽移必。三九已至,卻和暖如春毡鉴,著一層夾襖步出監(jiān)牢的瞬間崔泵,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 32,918評論 1 269
  • 我被黑心中介騙來泰國打工猪瞬, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留憎瘸,地道東北人。 一個月前我還...
    沈念sama閱讀 47,962評論 2 370
  • 正文 我出身青樓陈瘦,卻偏偏與公主長得像幌甘,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,781評論 2 354

推薦閱讀更多精彩內(nèi)容