Faster R-CNN 入坑之源碼閱讀

Faster R-CNN 原理簡述

上面就是Faster R-CNN的原理圖：

1. 首先搭建一個faster rcnn的基礎(chǔ)模型虎谢，搭建一個全卷積網(wǎng)絡(luò)。
1. 全卷積網(wǎng)絡(luò)會對原始的image進(jìn)行maxpooling倔监，vgg16進(jìn)行2x2x2x2的maxpooling，最后把圖片進(jìn)行1/16倍的縮放。
3.全卷積網(wǎng)絡(luò)最后一層分為兩個通道粒梦，（這里使用net稱呼最后一層的feature map，程序里就是使用的net）一個net送入RPN進(jìn)行區(qū)域推薦荸实，得到的是box的坐標(biāo)和坐標(biāo)的分?jǐn)?shù)（二分類匀们，包不包含有物體）。在得到box之后准给，box就會在另一個net上進(jìn)行特征的裁剪泄朴，再把所有裁剪下來的feature map進(jìn)行尺寸歸一到固定尺寸這就是所謂的ROIs。
4.得到所有的rois區(qū)域之后露氮，把這些rois區(qū)域進(jìn)行拉平輸出到兩個地方祖灰，一個是進(jìn)行物體的分類，一個是進(jìn)行box坐標(biāo)的回歸（相當(dāng)與在之前的box上進(jìn)行的微調(diào)）畔规。
1. 得到精確的box坐標(biāo)和物體類別分?jǐn)?shù)局扶。
6.記住，記住這里有坑油讯，就是關(guān)于訓(xùn)練和非訓(xùn)練是的參數(shù)設(shè)置是不一樣的详民。在這里提一個醒，就是在進(jìn)行正樣本和負(fù)樣本訓(xùn)練的時候陌兑，這里會和ground truth（box的真實位置）進(jìn)行對比（就在這里）沈跨，非訓(xùn)練是沒有g(shù)round truth，所以這里不進(jìn)行g(shù)round truth的操作兔综，而是直接使用top和nms進(jìn)行饿凛。
7.注意這里有兩個回歸和兩個分類狞玛，第一個回歸和分類在RPN網(wǎng)絡(luò)，進(jìn)行box的粗糙選取和是否含有物體的二分類涧窒，第二的回歸和分類在ROIs之后的predication網(wǎng)絡(luò)心肪，這里的分類是box的精確回歸和物體的分類（20分類）。
（備注： faser rcnn 的原理很簡單纠吴，但是這里面最最最復(fù)雜的是數(shù)據(jù)的處理硬鞍，這些數(shù)據(jù)處理沒有訓(xùn)練參數(shù)，但是戴已，卻占據(jù)了90%的代碼量）

代碼整理

這是我的工程項目文檔固该，在github上下載的，網(wǎng)上的版本太多糖儡，我不想一一去看了伐坏，本來是入坑Google 的object detection api 的，但還是需要看一看這種稍微簡單一點的源碼才能理順?biāo)悸贰?/p>

代碼前言

faster rcnn的整個網(wǎng)絡(luò)是由一個叫做network.py的文件中的基類Network進(jìn)行操作握联，所有的流程被這個叫Network的子類實現(xiàn)桦沉，所以，可以通過構(gòu)建多個Network子類構(gòu)建多個物體檢測的子類了金闽，源碼里面有實現(xiàn)了兩個子類vgg16和resnet子類纯露。

# vgg16.py
class vgg16(Network):
    def __init__(self, batch_size=1):
        Network.__init__(self, batch_size=batch_size)

# resnetv1.py
class resnetv1(Network):
  def __init__(self, batch_size=1, num_layers=50):
    Network.__init__(self, batch_size=batch_size)
    self._num_layers = num_layers
    self._resnet_scope = 'resnet_v1_%d' % num_layers

開始demo.py和train.py

1.demo.py

構(gòu)建一個基于vgg16的faster rcnn模型抹剩，把訓(xùn)練好的模型參數(shù)從cptk文件中回復(fù)，輸入img進(jìn)行檢測踩萎。

train.py
構(gòu)建一個基于vgg16的faster rcnn模型执解，，把預(yù)訓(xùn)練的模型參數(shù)從cptk文件中回復(fù)贵白，輸入img進(jìn)行檢測。
其實在demo.py和train.py中都有這么一句代碼:

# demo.py
net.create_architecture(sess, "TEST", 21,
                            tag='default', anchor_scales=[8, 16, 32])

# train.py
layers = self.net.create_architecture(sess, "TRAIN", 
                            self.imdb.num_classes, tag='default')

這就是構(gòu)建faster rcnn進(jìn)行計算圖的操作。
請記住這個模型在進(jìn)行模型參數(shù)恢復(fù)時罚随，train.py和demo.py的不一樣，demo.py時把所的參數(shù)進(jìn)行恢復(fù)并賦值羽资。而train.py只恢復(fù)到fc7淘菩，fc7輸出時4096，在fc7后面接了兩個輸出屠升，就是box坐標(biāo)和classes潮改。如果我們自己的訓(xùn)練數(shù)據(jù)并不是20或者99，我們在train.py的時候只需要更改num_classes既可以了腹暖，fc7后面的層就是合適新的分類任務(wù)所需的汇在。

開始分析vgg16（）：

鋪墊這么多，只為了在后面進(jìn)行分析時候能有個索引脏答。
vgg16這個類對外面調(diào)用的類似乎只有少數(shù)幾個方法糕殉。vgg16框架

import tensorflow as tf
import tensorflow.contrib.slim as slim

import lib.config.config as cfg
from lib.nets.network import Network

class vgg16(Network):
    def __init__(self, batch_size=1):
        Network.__init__(self, batch_size=batch_size)

    def build_network(self, sess, is_training=True):
             亩鬼。。阿蝶。雳锋。
            
            # rois 所有的rois框的坐標(biāo)的分類得分
            # cls_prob 進(jìn)行_num_classes的分類得分，經(jīng)過softmax
            # bbox_prediction 進(jìn)行 box的回歸
            return rois, cls_prob, bbox_pred

    def get_variables_to_restore(self, variables, var_keep_dic):
        

        return variables_to_restore

    def fix_variables(self, sess, pretrained_model):
        ....

    def build_head(self, is_training):
        # 全卷積網(wǎng)絡(luò)爲(wèi)五個層羡洁，每層有一個卷積玷过，一個池化操作，但是筑煮，最後一層操作中冶匹，僅
        # 有一個卷積操作，無池化操作咆瘟。
        .....
        #輸出的圖片被 縮短/16
        return net

    def build_rpn(self, net, is_training, initializer):

        # Build anchor component
        # 用來生成九個框的函數(shù)
        嚼隘。。袒餐。飞蛹。
        
        # 二分類操作和迴歸操作是並行的，於是用同樣1×1的卷積去操作原來的future map灸眼，
        # 生成長度爲(wèi)4×k卧檐，即_num_anchors×4的長度
        rpn_bbox_pred = slim.conv2d(rpn, self._num_anchors * 4, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_bbox_pred')
        # rpn_cls_prob, 是shape=[None, self._num_anchors * 2]的框的分?jǐn)?shù)是否有物體，進(jìn)行二分類經(jīng)過了softmax
        # rpn_bbox_pred, 是shape=[None,512,w,h,self._num_anchors * 4] 是框的坐標(biāo)焰宣，進(jìn)行坐標(biāo)回歸
        # rpn_cls_score,  是shape=[None,512,w,h,self._num_anchors * 2]的框的分?jǐn)?shù)是否有物體霉囚，進(jìn)行沒有二分類經(jīng)過了softmax
        # rpn_cls_score_reshape ，是shape=[None, 2]的框分?jǐn)?shù)
        return rpn_cls_prob, rpn_bbox_pred, rpn_cls_score, rpn_cls_score_reshape

    def build_proposals(self, is_training, rpn_cls_prob, rpn_bbox_pred, rpn_cls_score):
        # rpn_cls_prob, 是shape=[None, self._num_anchors * 2]的框的分?jǐn)?shù)是否有物體匕积，進(jìn)行二分類經(jīng)過了softmax
        # rpn_bbox_pred, 是shape=[None,512,w,h,self._num_anchors * 4] 是框的坐標(biāo)盈罐，進(jìn)行坐標(biāo)回歸
        # rpn_cls_score,  是shape=[None,512,w,h,self._num_anchors * 2]的框的分?jǐn)?shù)是否有物體，進(jìn)行沒有二分類經(jīng)過了softmax
        # 獲得合適的roi
        # 對坐標(biāo)的操作闪唆，rios為篩選出來的合適的框盅粪，roi_scores為
        。悄蕾。票顾。。帆调。奠骄。
        return rois

    def build_predictions(self, net, rois, is_training, initializer, initializer_bbox):

        # Crop image ROIs
        # 構(gòu)建固定大小的rois窗口
        .......
        
        # 通過fc7進(jìn)行box框的分類
        bbox_prediction = slim.fully_connected(fc7, self._num_classes * 4, weights_initializer=initializer_bbox, trainable=is_training, activation_fn=None, scope='bbox_pred')
        # cls_score 進(jìn)行_num_classes的分類得分
        # cls_prob 進(jìn)行_num_classes的分類得分，經(jīng)過softmax
        # bbox_prediction 進(jìn)行 box的回歸
        return cls_score, cls_prob, bbox_prediction

從上面看vgg16似乎只有7個可用的方法番刊，但是記住vgg16時繼承了Network的所有的方法含鳞，也就是說Network的所有方法vgg16都有。那我們開始抽絲剝繭吧先從create_architecture()開始：

create_architecture（）

def create_architecture(self, sess, mode, num_classes, tag=None, anchor_scales=(8, 16, 32), anchor_ratios=(0.5, 1, 2)):
        self._image = tf.placeholder(tf.float32, shape=[self._batch_size, None, None, 3])
        self._im_info = tf.placeholder(tf.float32, shape=[self._batch_size, 3])
        self._gt_boxes = tf.placeholder(tf.float32, shape=[None, 5])
        self._tag = tag

        self._num_classes = num_classes
        self._mode = mode
        self._anchor_scales = anchor_scales
        self._num_scales = len(anchor_scales)

        self._anchor_ratios = anchor_ratios
        self._num_ratios = len(anchor_ratios)
        
        # K 個box框
        self._num_anchors = self._num_scales * self._num_ratios

        training = mode == 'TRAIN'
        testing = mode == 'TEST'

        assert tag != None

        # handle most of the regularizer here
        weights_regularizer = tf.contrib.layers.l2_regularizer(cfg.FLAGS.weight_decay)
        if cfg.FLAGS.bias_decay:
            biases_regularizer = weights_regularizer
        else:
            biases_regularizer = tf.no_regularizer

        # list as many types of layers as possible, even if they are not used now
        with arg_scope([slim.conv2d, slim.conv2d_in_plane,
                        slim.conv2d_transpose, slim.separable_conv2d, slim.fully_connected],
                       weights_regularizer=weights_regularizer,
                       biases_regularizer=biases_regularizer,
                       biases_initializer=tf.constant_initializer(0.0)):
            # 前面指定了一系列卷積撵枢，反捲積的參數(shù)民晒，核心代碼爲(wèi)295行
            # rois爲(wèi)roi pooling層得到的框精居，
            # cls_prob得到的是最後全連接層的分類score，
            # bbox_pred得到的是二十一分類之後的分類標(biāo)籤潜必。
            rois, cls_prob, bbox_pred = self.build_network(sess, training)

        layers_to_output = {'rois': rois}
        layers_to_output.update(self._predictions)

        for var in tf.trainable_variables():
            self._train_summaries.append(var)

        if mode == 'TEST':
            stds = np.tile(np.array(cfg.FLAGS2["bbox_normalize_stds"]), (self._num_classes))
            means = np.tile(np.array(cfg.FLAGS2["bbox_normalize_means"]), (self._num_classes))
            self._predictions["bbox_pred"] *= stds
            self._predictions["bbox_pred"] += means
        else:
            self._add_losses()
            layers_to_output.update(self._losses)

        val_summaries = []
        with tf.device("/cpu:0"):
            val_summaries.append(self._add_image_summary(self._image, self._gt_boxes))
            for key, var in self._event_summaries.items():
                val_summaries.append(tf.summary.scalar(key, var))
            for key, var in self._score_summaries.items():
                self._add_score_summary(key, var)
            for var in self._act_summaries:
                self._add_act_summary(var)
            for var in self._train_summaries:
                self._add_train_summary(var)

        self._summary_op = tf.summary.merge_all()
        if not testing:
            self._summary_op_val = tf.summary.merge(val_summaries)

        return layers_to_output

在create_architecture中先是定義了輸入如下：
包括img（圖片）靴姿，im_info（img的尺寸），_gt_boxes（坐標(biāo)標(biāo)簽）磁滚，_tag（類別標(biāo)簽）

self._image = tf.placeholder(tf.float32, shape=[self._batch_size, None, None, 3])
self._im_info = tf.placeholder(tf.float32, shape=[self._batch_size, 3])
self._gt_boxes = tf.placeholder(tf.float32, shape=[None, 5])
self._tag = tag

其他就是網(wǎng)絡(luò)的參數(shù)佛吓，需要在構(gòu)建網(wǎng)絡(luò)是進(jìn)行設(shè)置,如下：

self._num_classes = num_classes（類別數(shù)）
self._mode = mode（訓(xùn)練還是，非訓(xùn)練）
self._anchor_scales = anchor_scales（框的尺寸垂攘，預(yù)測）
self._num_scales = len(anchor_scales)

self._anchor_ratios = anchor_ratios
self._num_ratios = len(anchor_ratios)
        
        # K 個box框
self._num_anchors = self._num_scales * self._num_ratios（多少個框==9）

training = mode == 'TRAIN'
testing = mode == 'TEST'

接下來開始進(jìn)行網(wǎng)絡(luò)的運行build_network（）

# 前面指定了一系列卷積维雇，反捲積的參數(shù)，核心代碼爲(wèi)295行
# rois爲(wèi)roi pooling層得到的框晒他，
# cls_prob得到的是最後全連接層的分類score吱型，
# bbox_pred得到的是二十一分類之後的分類標(biāo)籤。
rois, cls_prob, bbox_pred = self.build_network(sess, training)

build_network產(chǎn)生了img經(jīng)過網(wǎng)絡(luò)之后的輸出陨仅，
rois為roi pooling層得到的框津滞，
cls_prob得到的是最后全全連接層的score，
bbox_pred得到的是二十一分類之后的分類目標(biāo)灼伤。
什么４バ臁！狐赡！就做完了撞鹉，過程呢！Ｓ敝丁Ｄ癯！

接下來看看build_network發(fā)生了什么啊发皿，

build_network（）

build_network（）在vgg16中實現(xiàn)了

def build_network(self, sess, is_training=True):
        with tf.variable_scope('vgg_16', 'vgg_16'):
            """
            分爲(wèi)了幾段崔慧，build head拂蝎，buildrpn穴墅，build proposals，build predictions
            對應(yīng)的剛好是我們所剛剛敘述的全卷積層温自，RPN層玄货，Proposal Layer，和最後經(jīng)過的全連接層悼泌。
            """
            # select initializer
            if cfg.FLAGS.initializer == "truncated":
                initializer = tf.truncated_normal_initializer(mean=0.0, stddev=0.01)
                initializer_bbox = tf.truncated_normal_initializer(mean=0.0, stddev=0.001)
            else:
                initializer = tf.random_normal_initializer(mean=0.0, stddev=0.01)
                initializer_bbox = tf.random_normal_initializer(mean=0.0, stddev=0.001)

            # Build head
            # 全卷積網(wǎng)絡(luò)層的建立（build head）
            # 輸出的圖片被 縮短/16
            net = self.build_head(is_training)

            # Build rpn
            # rpn_cls_prob, 是shape=[None, self._num_anchors * 2]的框的分?jǐn)?shù)是否有物體松捉，進(jìn)行二分類經(jīng)過了softmax
            # rpn_bbox_pred, 是shape=[None,512,w,h,self._num_anchors * 4] 是框的坐標(biāo)，進(jìn)行坐標(biāo)回歸
            # rpn_cls_score,  是shape=[None,512,w,h,self._num_anchors * 2]的框的分?jǐn)?shù)是否有物體馆里，進(jìn)行沒有二分類經(jīng)過了softmax
            # rpn_cls_score_reshape 隘世，是shape=[None, 2]的框分?jǐn)?shù)
            rpn_cls_prob, rpn_bbox_pred, rpn_cls_score, rpn_cls_score_reshape = self.build_rpn(net, is_training, initializer)

            # Build proposals
            # 還是篩選框rois可柿，選擇合適的框
            rois = self.build_proposals(is_training, rpn_cls_prob, rpn_bbox_pred, rpn_cls_score)

            # Build predictions
            # cls_score 進(jìn)行_num_classes的分類得分
            # cls_prob 進(jìn)行_num_classes的分類得分，經(jīng)過softmax
            # bbox_prediction 進(jìn)行 box的回歸
            cls_score, cls_prob, bbox_pred = self.build_predictions(net, rois, is_training, initializer, initializer_bbox)

            self._predictions["rpn_cls_score"] = rpn_cls_score
            self._predictions["rpn_cls_score_reshape"] = rpn_cls_score_reshape
            self._predictions["rpn_cls_prob"] = rpn_cls_prob
            self._predictions["rpn_bbox_pred"] = rpn_bbox_pred
            self._predictions["cls_score"] = cls_score
            self._predictions["cls_prob"] = cls_prob
            self._predictions["bbox_pred"] = bbox_pred
            self._predictions["rois"] = rois

            self._score_summaries.update(self._predictions)
            
            # rois 所有的rois框的坐標(biāo)的分類得分
            # cls_prob 進(jìn)行_num_classes的分類得分丙者，經(jīng)過softmax
            # bbox_prediction 進(jìn)行 box的回歸
            return rois, cls_prob, bbox_pred

所以說复斥，就是從上面的幾個函數(shù)進(jìn)行如下

build_head()--->build_rpn()--->build_proposas()--->build_predictions()

1.build_head()函數(shù)：構(gòu)建CNN基層網(wǎng)絡(luò)
2.build_rpn()函數(shù)：在feature map上生成box的坐標(biāo)和判斷是否有物體
3.build_proposas()函數(shù)：對box進(jìn)行判斷，挑選合適的box械媒，其中進(jìn)行iou和nms操作目锭，這里沒有訓(xùn)練參數(shù)的生成。
4.build_predictions（）：這里進(jìn)行最后的類別分類和box框回歸之前會有一個rois網(wǎng)絡(luò)層纷捞，該網(wǎng)絡(luò)會把所有的feature map進(jìn)行尺寸resize到固定的尺寸痢虹，之后進(jìn)行拉伸。這里有兩路輸出主儡，一個是box的坐標(biāo)奖唯，另一個是類別的分?jǐn)?shù)。

這樣就可以進(jìn)行代碼的深入分析了：
先從build_head（）開始：

build_head（）

def build_head(self, is_training):
        # 全卷積網(wǎng)絡(luò)爲(wèi)五個層糜值，每層有一個卷積臭埋，一個池化操作，但是臀玄，最後一層操作中瓢阴，僅
        # 有一個卷積操作，無池化操作健无。
        # Main network
        # Layer  1
        net = slim.repeat(self._image, 2, slim.conv2d, 64, [3, 3], trainable=False, scope='conv1')
        net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool1')

        # Layer 2
        net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], trainable=False, scope='conv2')
        net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool2')

        # Layer 3
        net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], trainable=is_training, scope='conv3')
        net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool3')

        # Layer 4
        net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], trainable=is_training, scope='conv4')
        net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool4')

        # Layer 5
        net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], trainable=is_training, scope='conv5')

        # Append network to summaries
        self._act_summaries.append(net)

        # Append network as head layer
        self._layers['head'] = net
        #輸出的圖片被 縮短/16
        return net

這個函數(shù)沒有什么太大的問題荣恐，把一張圖片輸入到網(wǎng)絡(luò)進(jìn)行特征提取。之后把net輸出累贤。net代表了網(wǎng)絡(luò)的最后一層的輸出叠穆。

build_rpn()

def build_rpn(self, net, is_training, initializer):

        # Build anchor component
        # 用來生成九個框的函數(shù)
        self._anchor_component()

        # Create RPN Layer
        # 首先經(jīng)過了一個3×3的卷積，之後用1×1的卷積去進(jìn)行迴歸操作臼膏，分出前景或是背景硼被，形成分?jǐn)?shù)值
        rpn = slim.conv2d(net, 512, [3, 3], trainable=is_training, weights_initializer=initializer, scope="rpn_conv/3x3")

        self._act_summaries.append(rpn)
        # 分出前景或是背景，形成分?jǐn)?shù)值
        rpn_cls_score = slim.conv2d(rpn, self._num_anchors * 2, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_cls_score')

        # Change it so that the score has 2 as its channel size
        # 分出前景或是背景渗磅，形成分?jǐn)?shù)值嚷硫，未進(jìn)行運算
        rpn_cls_score_reshape = self._reshape_layer(rpn_cls_score, 2, 'rpn_cls_score_reshape')
        
        # 進(jìn)行softmax
        rpn_cls_prob_reshape = self._softmax_layer(rpn_cls_score_reshape, "rpn_cls_prob_reshape")
        
        # 
        rpn_cls_prob = self._reshape_layer(rpn_cls_prob_reshape, self._num_anchors * 2, "rpn_cls_prob")
        
        # 二分類操作和迴歸操作是並行的，於是用同樣1×1的卷積去操作原來的future map始鱼，
        # 生成長度爲(wèi)4×k仔掸，即_num_anchors×4的長度
        rpn_bbox_pred = slim.conv2d(rpn, self._num_anchors * 4, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_bbox_pred')
        # rpn_cls_prob, 是shape=[None, self._num_anchors * 2]的框的分?jǐn)?shù)是否有物體，進(jìn)行二分類經(jīng)過了softmax
        # rpn_bbox_pred, 是shape=[None,512,w,h,self._num_anchors * 4] 是框的坐標(biāo)医清，進(jìn)行坐標(biāo)回歸
        # rpn_cls_score,  是shape=[None,512,w,h,self._num_anchors * 2]的框的分?jǐn)?shù)是否有物體起暮，進(jìn)行沒有二分類經(jīng)過了softmax
        # rpn_cls_score_reshape ，是shape=[None, 2]的框分?jǐn)?shù)
        return rpn_cls_prob, rpn_bbox_pred, rpn_cls_score, rpn_cls_score_reshape

build_rpn函數(shù)就似乎進(jìn)行feature map的box的提取会烙，其輸出如下:

1.rpn_cls_prob, 是shape=[None, self._num_anchors * 2]的框
2.的分?jǐn)?shù)是否有物體负懦，進(jìn)行二分類經(jīng)過了softmax
3.rpn_bbox_pred, 是shape=[None,512,w,h,self._num_anchors * 4] 是框的坐標(biāo)筒捺，進(jìn)行坐標(biāo)回歸
4.rpn_cls_score, 是shape=[None,512,w,h,self._num_anchors * 2]的框的分?jǐn)?shù)是否有物體，進(jìn)行沒有二分類經(jīng)過了softmax
5.rpn_cls_score_reshape 纸厉，是shape=[None, 2]的框分?jǐn)?shù)

注意這個函數(shù)在內(nèi)部調(diào)用了_anchor_component(),這個函數(shù)用來生成對所有的點生成九個框焙矛，一共會生成W x H x 9個框。

_anchor_component()

# 
def _anchor_component(self):
        with tf.variable_scope('ANCHOR_' + 'default'):
            # generate_anchors()產(chǎn)生位置
            # just to get the shape right
            # feat_stride爲(wèi)原始圖像與這裏圖像的倍數(shù)關(guān)係残腌，feat_stride在這裏爲(wèi)16
            #_im_info[0, 0]原始圖片的尺寸
            height = tf.to_int32(tf.ceil(self._im_info[0, 0] / np.float32(self._feat_stride[0])))
            width = tf.to_int32(tf.ceil(self._im_info[0, 1] / np.float32(self._feat_stride[0])))
            
            # snippit()中相關(guān)代碼
            # 這里產(chǎn)生了所有的圖片產(chǎn)生的框村斟，如果feature map大小是 W x H x 9個框，每個框大小已經(jīng)被映射到原圖抛猫，
            # 也就是乘上了16
            # 
            anchors, anchor_length = tf.py_func(generate_anchors_pre,
                                                [height, width,
                                                 self._feat_stride, self._anchor_scales, self._anchor_ratios],
                                                [tf.float32, tf.int32], name="generate_anchors")
            anchors.set_shape([None, 4])
            anchor_length.set_shape([])
            self._anchors = anchors
            self._anchor_length = anchor_length

在_anchor_component（）內(nèi)部調(diào)generate_anchors_pre（）這個函數(shù),才是生成所有的框的函數(shù)蟆盹。

generate_anchors_pre()

def generate_anchors_pre(height, width, feat_stride, anchor_scales=(8,16,32), anchor_ratios=(0.5,1,2)):
  """ A wrapper function to generate anchors given different scales
    Also return the number of anchors in variable 'length'
  """
  """生成anchor的預(yù)處理方法，generate_anchors方法就是直接產(chǎn)生各種大小的anchor box闺金，generate_anchors_pre方法
     是把每一個anchor box對應(yīng)到原圖上
      height = tf.to_int32(tf.ceil(self._im_info[0] / np.float32(self._feat_stride[0])))
      width = tf.to_int32(tf.ceil(self._im_info[1] / np.float32(self._feat_stride[0])))
      feat_stride: 經(jīng)過VGG或者ZF后特征圖相對于原圖的在長或者寬上的縮放倍數(shù)逾滥，也就是說height和width對應(yīng)于特征圖長寬
      anchor_scales：anchor尺寸
      anchor_ratios: anchor長寬比
  """
  # 只有9個框
  anchors = generate_anchors(ratios=np.array(anchor_ratios), scales=np.array(anchor_scales)) # 產(chǎn)生各種大小的anchor box
  A = anchors.shape[0] # anchor的種數(shù)
  shift_x = np.arange(0, width) * feat_stride # 特征圖相對于原圖的偏移
  shift_y = np.arange(0, height) * feat_stride # 特征圖相對于原圖的偏移
  shift_x, shift_y = np.meshgrid(shift_x, shift_y) # 返回坐標(biāo)矩陣
  shifts = np.vstack((shift_x.ravel(), shift_y.ravel(), shift_x.ravel(), shift_y.ravel())).transpose()
  K = shifts.shape[0]
  # width changes faster, so here it is H, W, C
  anchors = anchors.reshape((1, A, 4)) + shifts.reshape((1, K, 4)).transpose((1, 0, 2)) 
  # K x A x 4 想相當(dāng)與把 anchor box加載featu map上，現(xiàn)在fe'a
  # anchor坐標(biāo)加上anchor box大小
  # H x W x 9個框
  anchors = anchors.reshape((K * A, 4)).astype(np.float32, copy=False)
  length = np.int32(anchors.shape[0]) 
  return anchors, length

當(dāng)然败匹，這里面調(diào)用了一個函數(shù)寨昙，就是generate_anchors（）函數(shù)，generate_anchors（）就是對一個點產(chǎn)生固定大小的的框掀亩，按照輸入的參數(shù)舔哪，就可以在原圖生成9個框了。：
generate_anchors():

def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
                     scales=2 ** np.arange(3, 6)):
    """
    Generate anchor (reference) windows by enumerating aspect ratios X
    scales wrt a reference (0, 0, 15, 15) window.
    """

    base_anchor = np.array([1, 1, base_size, base_size]) - 1
    ratio_anchors = _ratio_enum(base_anchor, ratios)
    anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
                         for i in range(ratio_anchors.shape[0])])
    return anchors

這個函數(shù)僅僅是對feature map的處理槽棍，沒有參數(shù)的訓(xùn)練捉蚤，
這里可以直接test。
這里在一張1024的圖上炼七，產(chǎn)生了9個框缆巧，坐標(biāo)點位是（365，365）

import time
    import numpy as np
    import cv2

    # Create a black image
    
    t = time.time()
    a = generate_anchors()
    print(time.time() - t)
    print(a)
    img = np.zeros((1024,1024,3), np.uint8)
    for i in a:
        i = np.array(i) + 365
        cv2.rectangle(img,(int(i[0]),int(i[1])),(int(i[2]),int(i[3])),(0,255,0),3)

    cv2.imshow('line',img)
    cv2.waitKey()
    cv2.waitKey()

這里的框如下：

上面的坐標(biāo)為：
這里的賦值豌拙，就是中心（365陕悬，365）的偏移值。

[[ -84.  -40.   99.   55.]
 [-176.  -88.  191.  103.]
 [-360. -184.  375.  199.]
 [ -56.  -56.   71.   71.]
 [-120. -120.  135.  135.]
 [-248. -248.  263.  263.]
 [ -36.  -80.   51.   95.]
 [ -80. -168.   95.  183.]
 [-168. -344.  183.  359.]]

現(xiàn)在可以往回走了按傅。
回到generate_anchors_pre（）吧捉超！

  shift_x, shift_y = np.meshgrid(shift_x, shift_y) # 返回坐標(biāo)矩陣
  shifts = np.vstack((shift_x.ravel(), shift_y.ravel(), shift_x.ravel(), shift_y.ravel())).transpose()
  K = shifts.shape[0]
  # width changes faster, so here it is H, W, C
  anchors = anchors.reshape((1, A, 4)) + shifts.reshape((1, K, 4)).transpose((1, 0, 2)) 
  # K x A x 4 想相當(dāng)與把 anchor box加載featu map上，現(xiàn)在fe'a
  # anchor坐標(biāo)加上anchor box大小
  # H x W x 9個框
  anchors = anchors.reshape((K * A, 4)).astype(np.float32, copy=False)
  length = np.int32(anchors.shape[0])

這幾步操作逞敷，就是把對單點的框狂秦，擴(kuò)展到整個feature map，這里的anchors和length推捐，是函數(shù)的最終返回值，anchors是shape=[HxWx9,4]的大小侧啼，這里的每個點在原圖中對應(yīng)16個點的視野牛柒，這里的[2,2]在原圖中對應(yīng)了[32,32]的視野堪簿。這里還沒有batch size的概念，這里只是對一張feature map產(chǎn)生框皮壁。
再回到_anchor_component():

anchors.set_shape([None, 4])
anchor_length.set_shape([])
self._anchors = anchors
self._anchor_length = anchor_length

在這里的anchors被設(shè)置到([None, 4])椭更，同時也拿到了anchor_length數(shù)量，這里是WxHx9.
再回到build_rpn（）
在構(gòu)建了框之后蛾魄，net就經(jīng)過了[3,3]的卷積虑瀑，

rpn = slim.conv2d(net, 512, [3, 3], trainable=is_training, weights_initializer=initializer, scope="rpn_conv/3x3")

再經(jīng)過[1,1]卷積，判斷出每個feature map點上是否有物體滴须，這里使用2分類舌狗。

# 分出前景或是背景，形成分?jǐn)?shù)值
rpn_cls_score = slim.conv2d(rpn, self._num_anchors * 2, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_cls_score')

使用[1,1]卷積扔水，判斷出每個feature map點上是否有物體的box坐標(biāo)痛侍，每個坐標(biāo)包含4個值，左上和右下坐標(biāo)魔市。

rpn_bbox_pred = slim.conv2d(rpn, self._num_anchors * 4, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_bbox_pred')

我們再次回到build_network（）
在上一步說到主届，特征圖中每個點的9個框搞定，同時網(wǎng)絡(luò)給定了在每個點的預(yù)測結(jié)果（是否為背景）待德，也是每個點預(yù)測9個框的分?jǐn)?shù)君丁。每張圖片的框時20000個左右，這里的框有點多将宪。接下來谈截，進(jìn)行訓(xùn)練和預(yù)測時，需要挑選合適的框進(jìn)行預(yù)測涧偷。
build_proposals就是構(gòu)建（選擇）合適的框簸喂，進(jìn)行下一步的推斷。

# 篩選框rois燎潮，選擇合適的框
rois = self.build_proposals(is_training, rpn_cls_prob, rpn_bbox_pred, rpn_cls_score)

build_proposals()

def build_proposals(self, is_training, rpn_cls_prob, rpn_bbox_pred, rpn_cls_score):
        # rpn_cls_prob, 是shape=[None, self._num_anchors * 2]的框的分?jǐn)?shù)是否有物體喻鳄，進(jìn)行二分類經(jīng)過了softmax
        # rpn_bbox_pred, 是shape=[None,512,w,h,self._num_anchors * 4] 是框的坐標(biāo)，進(jìn)行坐標(biāo)回歸
        # rpn_cls_score,  是shape=[None,512,w,h,self._num_anchors * 2]的框的分?jǐn)?shù)是否有物體确封，進(jìn)行沒有二分類經(jīng)過了softmax
        # 獲得合適的roi
        if is_training:
            # 對坐標(biāo)的操作除呵，rios為篩選出來的合適的框，roi_scores為
            rois, roi_scores = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
            #篩出來IOU大於70%的框
            rpn_labels = self._anchor_target_layer(rpn_cls_score, "anchor")

            # Try to have a deterministic order for the computing graph, for reproducibility
            with tf.control_dependencies([rpn_labels]):
                rois, _ = self._proposal_target_layer(rois, roi_scores, "rpn_rois")
        else:
            if cfg.FLAGS.test_mode == 'nms':
                rois, _ = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
            elif cfg.FLAGS.test_mode == 'top':
                rois, _ = self._proposal_top_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
            else:
                raise NotImplementedError
        return rois

從代碼上就可以看輸出爪喘，這分為訓(xùn)練和非訓(xùn)練兩種情況颜曾。WHY，前面說到秉剑，訓(xùn)練時這里可以有ground truth泛豪，但在非訓(xùn)練的時候沒有ground truth。所以這是要區(qū)分開來的。
這里有_proposal_layer()

rois, roi_scores = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois")

_proposal_layer()

_proposal_layer調(diào)用了proposal_layer()那就直接看proposal_layer()

def proposal_layer(rpn_cls_prob, rpn_bbox_pred, im_info, cfg_key, _feat_stride, anchors, num_anchors):
    # rpn_cls_prob, 是shape=[None, self._num_anchors * 2]的框的分?jǐn)?shù)是否有物體诡曙，進(jìn)行二分類經(jīng)過了softmax
    # rpn_bbox_pred, 是shape=[None,512,w,h,self._num_anchors * 4] 是框的坐標(biāo)臀叙，進(jìn)行坐標(biāo)回歸
    """A simplified version compared to fast/er RCNN
       For details please see the technical report
    """
    
    """
        proposal_layer中做的事情：實際上上，在proposal_layer中的任務(wù)主要就是篩選合適的框价卤，
        縮小檢測範(fàn)圍劝萤，那麼，在前文回憶部分的步驟⑤中我們已經(jīng)說到：第一慎璧，篩選與ground truth中床嫌，
        重疊率大於70%的候選框，篩掉其他的候選框胸私，縮小範(fàn)圍厌处；第二，用NMS非極大值抑制盖文，
        篩選二分類中前n個score值的候選框嘱蛋；第三，篩掉越界框後五续，
        再來從前n個從大到小排序的值中篩選一次
    """
    
    if type(cfg_key) == bytes:
        cfg_key = cfg_key.decode('utf-8')

    if cfg_key == "TRAIN":
        pre_nms_topN = cfg.FLAGS.rpn_train_pre_nms_top_n
        post_nms_topN = cfg.FLAGS.rpn_train_post_nms_top_n
        nms_thresh = cfg.FLAGS.rpn_train_nms_thresh
    else:
        pre_nms_topN = cfg.FLAGS.rpn_test_pre_nms_top_n
        post_nms_topN = cfg.FLAGS.rpn_test_post_nms_top_n
        nms_thresh = cfg.FLAGS.rpn_test_nms_thresh

    im_info = im_info[0]
    # Get the scores and bounding boxes
    scores = rpn_cls_prob[:, :, :, num_anchors:]
    rpn_bbox_pred = rpn_bbox_pred.reshape((-1, 4))
    scores = scores.reshape((-1, 1))
    
    # 先進(jìn)行了整體平移洒敏，再進(jìn)行了整體縮放，所以疙驾，在求出變換因子之後凶伙，
    # 求出，pred_ctr_x, pred_ctr_y, pred_w以及pred_h
    proposals = bbox_transform_inv(anchors, rpn_bbox_pred)
    proposals = clip_boxes(proposals, im_info[:2])

    # Pick the top region proposals
    order = scores.ravel().argsort()[::-1]
    if pre_nms_topN > 0:
        order = order[:pre_nms_topN]
    proposals = proposals[order, :]
    scores = scores[order]

    # Non-maximal suppression
    keep = nms(np.hstack((proposals, scores)), nms_thresh)

    # Pick th top region proposals after NMS
    if post_nms_topN > 0:
        keep = keep[:post_nms_topN]
    proposals = proposals[keep, :]
    scores = scores[keep]

    # Only support single image as input
    batch_inds = np.zeros((proposals.shape[0], 1), dtype=np.float32)
    blob = np.hstack((batch_inds, proposals.astype(np.float32, copy=False)))

    return blob, scores

bbox_transform_inv（）坐標(biāo)的變換

bbox_transform_inv函數(shù)結(jié)合RPN的輸出對所有初始框進(jìn)行了坐標(biāo)變換

def bbox_transform_inv(boxes, deltas):
    '''
    Applies deltas to box coordinates to obtain new boxes, as described by 
    deltas
    '''   
    if boxes.shape[0] == 0:
        return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype)
 
    boxes = boxes.astype(deltas.dtype, copy=False)
    
    #獲得初始proposal的中心和長寬信息
    widths = boxes[:, 2] - boxes[:, 0] + 1.0
    heights = boxes[:, 3] - boxes[:, 1] + 1.0
    ctr_x = boxes[:, 0] + 0.5 * widths
    ctr_y = boxes[:, 1] + 0.5 * heights
 
    #獲得坐標(biāo)變換信息
    dx = deltas[:, 0::4]
    dy = deltas[:, 1::4]
    dw = deltas[:, 2::4]
    dh = deltas[:, 3::4]
 
    #得到改變后的proposal的中心和長寬信息
    pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
    pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
    pred_w = np.exp(dw) * widths[:, np.newaxis]
    pred_h = np.exp(dh) * heights[:, np.newaxis]
 
    #將改變后的proposal的中心和長寬信息還原成左上角和右下角的版本
    pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)
    # x1
    pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
    # y1
    pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
    # x2
    pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w
    # y2
    pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h
 
    return pred_boxes

如下公式：

image.png

使用clip_boxes函數(shù)將改變坐標(biāo)信息后超過圖像邊界的框的邊框裁剪一下它碎，使之在圖像邊界之內(nèi)函荣。clip_boxes函數(shù)如下所示

clip_boxes（）

def clip_boxes(boxes, im_shape):
    """
    Clip boxes to image boundaries.
    """
 
    #嚴(yán)格限制proposal的四個角在圖像邊界內(nèi)
    # x1 >= 0
    boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0)
    # y1 >= 0
    boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0)
    # x2 < im_shape[1]
    boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0)
    # y2 < im_shape[0]
    boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0)
    return boxes

對所有的框按照前景分?jǐn)?shù)進(jìn)行排序，選擇排序后的前pre_nms_topN和框扳肛。

order = scores.ravel().argsort()[::-1]
 if pre_nms_topN > 0:
      order = order[:pre_nms_topN]
proposals = proposals[order, :]
scores = scores[order]

對于上一步選擇出來的框傻挂，用nms算法根據(jù)閾值排除掉重疊的框。

keep = nms(np.hstack((proposals, scores)), nms_thresh)

nms()

def py_cpu_nms(dets, thresh):
    """Pure Python NMS baseline."""
    x1 = dets[:, 0]
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]
    scores = dets[:, 4]

    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
    order = scores.argsort()[::-1]

    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)
        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])

        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h
        ovr = inter / (areas[i] + areas[order[1:]] - inter)

        inds = np.where(ovr <= thresh)[0]
        order = order[inds + 1]

    return keep

對于剩下的框挖息，選擇post_nms_topN個最終的框金拒。

# Pick th top region proposals after NMS
if post_nms_topN > 0:
     keep = keep[:post_nms_topN]
proposals = proposals[keep, :]
scores = scores[keep]

所有選出的框之后，需要在feature map 上插入索引套腹，由于batch size為1绪抛，因此都插入0。

batch_inds = np.zeros((proposals.shape[0], 1), dtype=np.float32)
blob = np.hstack((batch_inds, proposals.astype(np.float32, copy=False)))

返回build_proposals()中在進(jìn)行_proposal_layer之后還需要進(jìn)行正負(fù)樣本處理电禀，篩選出來IOU大於70%的框

    def _anchor_target_layer(self, rpn_cls_score, name):
        with tf.variable_scope(name):
            rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights = tf.py_func(
                anchor_target_layer,
                [rpn_cls_score, self._gt_boxes, self._im_info, self._feat_stride, self._anchors, self._num_anchors],
                [tf.float32, tf.float32, tf.float32, tf.float32])

最后返回所有的框坐標(biāo)幢码，注意:這里的box框的面積大小不定，所有的框還沒有統(tǒng)一大小尖飞，需要在rois層中進(jìn)行尺度的換症副。
再次回到build_network(),最后就是build_predictions（）：

cls_score, cls_prob, bbox_pred = self.build_predictions(net, rois, is_training, initializer, initializer_bbox)

build_predictions（）

def build_predictions(self, net, rois, is_training, initializer, initializer_bbox):

        # Crop image ROIs
        # 構(gòu)建固定大小的rois窗口
        pool5 = self._crop_pool_layer(net, rois, "pool5")
        pool5_flat = slim.flatten(pool5, scope='flatten')

        # Fully connected layers
        fc6 = slim.fully_connected(pool5_flat, 4096, scope='fc6')
        if is_training:
            fc6 = slim.dropout(fc6, keep_prob=0.5, is_training=True, scope='dropout6')

        fc7 = slim.fully_connected(fc6, 4096, scope='fc7')
        if is_training:
            fc7 = slim.dropout(fc7, keep_prob=0.5, is_training=True, scope='dropout7')

        # Scores and predictions
        # 通過fc7進(jìn)行_num_classes的分類
        cls_score = slim.fully_connected(fc7, self._num_classes, weights_initializer=initializer, trainable=is_training, activation_fn=None, scope='cls_score')
        cls_prob = self._softmax_layer(cls_score, "cls_prob")
        
        # 通過fc7進(jìn)行box框的分類
        bbox_prediction = slim.fully_connected(fc7, self._num_classes * 4, weights_initializer=initializer_bbox, trainable=is_training, activation_fn=None, scope='bbox_pred')
        # cls_score 進(jìn)行_num_classes的分類得分
        # cls_prob 進(jìn)行_num_classes的分類得分店雅，經(jīng)過softmax
        # bbox_prediction 進(jìn)行 box的回歸
        return cls_score, cls_prob, bbox_prediction

把rois（框的坐標(biāo)，還未進(jìn)行尺寸處理瓦糕，pool5才是固定尺寸的feature map)特征圖輸入到網(wǎng)絡(luò)底洗。進(jìn)行最后的分類和定位
這里的_crop_pool_layer()函數(shù)腋么，就是crop_pool_layer了咕娄，利用box框的坐標(biāo)，在net上找到對應(yīng)的feature map區(qū)域珊擂。
返回的是固定大小的feature map==pool5.

def _crop_pool_layer(self, bottom, rois, name):
        #固定大小的窗口
        with tf.variable_scope(name):
            batch_ids = tf.squeeze(tf.slice(rois, [0, 0], [-1, 1], name="batch_id"), [1])
            # Get the normalized coordinates of bboxes
            bottom_shape = tf.shape(bottom)
            height = (tf.to_float(bottom_shape[1]) - 1.) * np.float32(self._feat_stride[0])
            width = (tf.to_float(bottom_shape[2]) - 1.) * np.float32(self._feat_stride[0])
            x1 = tf.slice(rois, [0, 1], [-1, 1], name="x1") / width
            y1 = tf.slice(rois, [0, 2], [-1, 1], name="y1") / height
            x2 = tf.slice(rois, [0, 3], [-1, 1], name="x2") / width
            y2 = tf.slice(rois, [0, 4], [-1, 1], name="y2") / height
            # Won't be backpropagated to rois anyway, but to save time
            bboxes = tf.stop_gradient(tf.concat([y1, x1, y2, x2], axis=1))
            pre_pool_size = cfg.FLAGS.roi_pooling_size * 2
            crops = tf.image.crop_and_resize(bottom, bboxes, tf.to_int32(batch_ids), [pre_pool_size, pre_pool_size], name="crops")

        return slim.max_pool2d(crops, [2, 2], padding='SAME')

這里pool5被拉平圣勒，之后送入fc6----->fc7，這里的fc7是一個公共層摧扇。fc7會有倆個輸出圣贸，一個是進(jìn)分類，另外一個是進(jìn)行box的坐標(biāo)回歸扛稽。
分類：

cls_score = slim.fully_connected(fc7, self._num_classes, weights_initializer=initializer, trainable=is_training, activation_fn=None, scope='cls_score')
        cls_prob = self._softmax_layer(cls_score, "cls_prob")

box回歸:

# 通過fc7進(jìn)行box框的分類
bbox_prediction = slim.fully_connected(fc7, self._num_classes * 4, weights_initializer=initializer_bbox, trainable=is_training, activation_fn=None, scope='bbox_pred')

到這里所有網(wǎng)絡(luò)的object detection 網(wǎng)絡(luò)所干的事就干完了吁峻。
而在train.py中,網(wǎng)絡(luò)會進(jìn)行判斷是否在TEST和TRIAN。
TEST的話就結(jié)束計算了在张，而TRIAN還需要進(jìn)行l(wèi)oss計算

def _add_losses(self, sigma_rpn=3.0):
        with tf.variable_scope('loss_' + self._tag):
            # RPN, class loss
            rpn_cls_score = tf.reshape(self._predictions['rpn_cls_score_reshape'], [-1, 2])
            rpn_label = tf.reshape(self._anchor_targets['rpn_labels'], [-1])
            rpn_select = tf.where(tf.not_equal(rpn_label, -1))
            rpn_cls_score = tf.reshape(tf.gather(rpn_cls_score, rpn_select), [-1, 2])
            rpn_label = tf.reshape(tf.gather(rpn_label, rpn_select), [-1])
            rpn_cross_entropy = tf.reduce_mean(
                tf.nn.sparse_softmax_cross_entropy_with_logits(logits=rpn_cls_score, labels=rpn_label))

            # RPN, bbox loss
            rpn_bbox_pred = self._predictions['rpn_bbox_pred']
            rpn_bbox_targets = self._anchor_targets['rpn_bbox_targets']
            rpn_bbox_inside_weights = self._anchor_targets['rpn_bbox_inside_weights']
            rpn_bbox_outside_weights = self._anchor_targets['rpn_bbox_outside_weights']

            rpn_loss_box = self._smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_bbox_inside_weights,
                                                rpn_bbox_outside_weights, sigma=sigma_rpn, dim=[1, 2, 3])

            # RCNN, class loss
            cls_score = self._predictions["cls_score"]
            label = tf.reshape(self._proposal_targets["labels"], [-1])

            cross_entropy = tf.reduce_mean(
                tf.nn.sparse_softmax_cross_entropy_with_logits(
                    logits=tf.reshape(cls_score, [-1, self._num_classes]), labels=label))

            # RCNN, bbox loss
            bbox_pred = self._predictions['bbox_pred']
            bbox_targets = self._proposal_targets['bbox_targets']
            bbox_inside_weights = self._proposal_targets['bbox_inside_weights']
            bbox_outside_weights = self._proposal_targets['bbox_outside_weights']

            loss_box = self._smooth_l1_loss(bbox_pred, bbox_targets, bbox_inside_weights, bbox_outside_weights)

            self._losses['cross_entropy'] = cross_entropy
            self._losses['loss_box'] = loss_box
            self._losses['rpn_cross_entropy'] = rpn_cross_entropy
            self._losses['rpn_loss_box'] = rpn_loss_box

            loss = cross_entropy + loss_box + rpn_cross_entropy + rpn_loss_box
            self._losses['total_loss'] = loss

            self._event_summaries.update(self._losses)

        return loss

從整個網(wǎng)路進(jìn)行分析用含，可以發(fā)現(xiàn)網(wǎng)絡(luò)有四個輸出。分別是RPN box 和RPBN class 帮匾，RCNN box 和 RCNN class啄骇。box使用的是回歸損失，class使用的是交叉熵?fù)p失瘟斜。把所有的loss進(jìn)行相加缸夹，可以進(jìn)行聯(lián)合訓(xùn)練。

2018.10.25 更新

在train.py這里的代碼好像和論文不一樣螺句，畢竟不是原作者的寫的虽惭。這里的模型其實是RPN網(wǎng)絡(luò)與Fast RNN直接進(jìn)行聯(lián)合訓(xùn)練。如下：
train_op就是集合所有的loss蛇尚，沒有分階段訓(xùn)練芽唇。

layers = self.net.create_architecture(sess, "TRAIN", self.imdb.num_classes, tag='default')
            loss = layers['total_loss']
            lr = tf.Variable(cfg.FLAGS.learning_rate, trainable=False)
            momentum = cfg.FLAGS.momentum
            optimizer = tf.train.MomentumOptimizer(lr, momentum)

            gvs = optimizer.compute_gradients(loss)

            # Double bias
            # Double the gradient of the bias if set
            if cfg.FLAGS.double_bias:
                final_gvs = []
                with tf.variable_scope('Gradient_Mult'):
                    for grad, var in gvs:
                        scale = 1.
                        if cfg.FLAGS.double_bias and '/biases:' in var.name:
                            scale *= 2.
                        if not np.allclose(scale, 1.0):
                            grad = tf.multiply(grad, scale)
                        final_gvs.append((grad, var))
                train_op = optimizer.apply_gradients(final_gvs)
            else:
                train_op = optimizer.apply_gradients(gvs)

                ....................................................

   
            rpn_loss_cls, rpn_loss_box, loss_cls, loss_box, total_loss = self.net.train_step(sess, blobs, train_op)

再次都這

參考：
詳細(xì)的Faster R-CNN源碼解析之proposal_layer和proposal_target_layer源碼解析
 基于Tensorflow的目標(biāo)檢測（Detection）的代碼案例詳解

最后編輯于：2018.10.25 10:55:06

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市佣蓉，隨后出現(xiàn)的幾起案子披摄，更是在濱河造成了極大的恐慌，老刑警劉巖勇凭，帶你破解...
沈念sama閱讀 216,470評論 6贊 501
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件疚膊，死亡現(xiàn)場離奇詭異，居然都是意外死亡虾标，警方通過查閱死者的電腦和手機(jī)寓盗，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,393評論 3贊 392
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人傀蚌，你說我怎么就攤上這事基显。” “怎么了善炫？”我有些...
開封第一講書人閱讀 162,577評論 0贊 353
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵撩幽，是天一觀的道長。經(jīng)常有香客問我箩艺，道長窜醉，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 58,176評論 1贊 292
?港島之戀（遺憾婚禮）
正文為了忘掉前任艺谆，我火速辦了婚禮榨惰，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘静汤。我一直安慰自己琅催，他們只是感情好，可當(dāng)我...
茶點故事閱讀 67,189評論 6贊 388
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布虫给。她就那樣靜靜地躺著藤抡，像睡著了一般。火紅的嫁衣襯著肌膚如雪狰右。梳的紋絲不亂的頭發(fā)上杰捂，一...
開封第一講書人閱讀 51,155評論 1贊 299
城市分裂傳說
那天，我揣著相機(jī)與錄音棋蚌，去河邊找鬼嫁佳。笑死，一個胖子當(dāng)著我的面吹牛谷暮，可吹牛的內(nèi)容都是我干的蒿往。我是一名探鬼主播，決...
沈念sama閱讀 40,041評論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼湿弦，長吁一口氣：“原來是場噩夢啊……” “哼瓤漏！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起颊埃，我...
開封第一講書人閱讀 38,903評論 0贊 274
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤蔬充，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后班利，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體饥漫，經(jīng)...
沈念sama閱讀 45,319評論 1贊 310
?護(hù)林員之死
正文獨居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 37,539評論 2贊 332
?白月光啟示錄
正文我和宋清朗相戀三年罗标，在試婚紗的時候發(fā)現(xiàn)自己被綠了庸队。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片积蜻。...
茶點故事閱讀 39,703評論 1贊 348
活死人
序言：一個原本活蹦亂跳的男人離奇死亡，死狀恐怖彻消，靈堂內(nèi)的尸體忽然破棺而出竿拆，到底是詐尸還是另有隱情，我是刑警寧澤宾尚，帶...
沈念sama閱讀 35,417評論 5贊 343
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布丙笋，位于F島的核電站，受9級特大地震影響央勒，放射性物質(zhì)發(fā)生泄漏不见。R本人自食惡果不足惜澳化，卻給世界環(huán)境...
茶點故事閱讀 41,013評論 3贊 325
男人毒藥：我在死后第九天來索命
文/蒙蒙一崔步、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧缎谷，春花似錦井濒、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,664評論 0贊 22
一樁弒父案瑞你，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至希痴，卻和暖如春者甲，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背砌创。一陣腳步聲響...
開封第一講書人閱讀 32,818評論 1贊 269
情欲美人皮
我被黑心中介騙來泰國打工虏缸，沒想到剛下飛機(jī)就差點兒被人妖公主榨干…… 1. 我叫王不留，地道東北人嫩实。一個月前我還...
沈念sama閱讀 47,711評論 2贊 368
代替公主和親
正文我出身青樓刽辙，卻偏偏與公主長得像，于是被迫代替她去往敵國和親甲献。傳聞我的和親對象是個殘疾皇子宰缤，可洞房花燭夜當(dāng)晚...
茶點故事閱讀 44,601評論 2贊 353