論文題目為:
RepPoints: Point Set Representation for Object Detection
idea總結(jié):
- 改變目標(biāo)檢測(cè)領(lǐng)域中對(duì)于目標(biāo)用矩形框的表現(xiàn)形式,而是采用點(diǎn)集的形式來表現(xiàn)一個(gè)物體的輪廓
- 特征抽取后,配合deformable convolution來進(jìn)對(duì)物體中心點(diǎn)的偏移量學(xué)習(xí),得到其點(diǎn)集的位置.
- 提出deformable RoI pooling.
- 提出三種轉(zhuǎn)換方式,將點(diǎn)集轉(zhuǎn)化為矩形框方便評(píng)測(cè)該目標(biāo)檢測(cè)算法的指標(biāo)
respoints 表示
傳統(tǒng)目標(biāo)檢測(cè)采用一個(gè)4-D的向量來表示一個(gè)物體,其分別代表了物體的中心點(diǎn)坐標(biāo),物體框的寬與高.
respoint則是用一組點(diǎn)集來表示,其中n代表了取樣點(diǎn)的數(shù)量(文中設(shè)置為9).建議為某個(gè)數(shù)的平方
如圖表示,respints在backbone骨干網(wǎng)絡(luò)抽取特征后,通過其RepPointsHead結(jié)構(gòu)轉(zhuǎn)化成9個(gè)物體的輪廓點(diǎn),然后,這9個(gè)點(diǎn)形成物體邊框的pseudo box,然后再轉(zhuǎn)化為傳統(tǒng)目標(biāo)檢測(cè)的bbox.
回顧傳統(tǒng)的多階段目標(biāo)檢測(cè)
傳統(tǒng)的兩階段目標(biāo)檢測(cè)流程:
- 通過預(yù)設(shè)的錨點(diǎn)(anchor)來覆蓋一定范圍的邊界框比例和縱橫比.
- 對(duì)于錨點(diǎn),將其中心點(diǎn)處的圖像特征作為對(duì)象特征,生成有關(guān)錨點(diǎn)是否為目標(biāo)對(duì)象的置信度得分,并通過邊界框回歸生成精煉的邊界框(bbox proposals)
- 在第二階段,通過 RoI-pooling 或 RoI-Align從(2)中獲得的邊界框建議提取對(duì)象特征.
- 經(jīng)過改進(jìn)的特征將通過邊界框回歸產(chǎn)生最終的邊界框目標(biāo)莽使。
- 對(duì)于多階段方法茎辐,還通過邊界框回歸委造,使用改進(jìn)的特征來生成中間的改進(jìn)的邊界框建議(S2)。在生成最終的邊界框目標(biāo)之前,可以多次重復(fù)此步驟,用以修正目標(biāo)框邊界.
邊界框與點(diǎn)集回歸對(duì)比
逐步完善邊界框定位和特征提取對(duì)于多階段目標(biāo)檢測(cè)方法的成功至關(guān)重要。
對(duì)于bbox表現(xiàn)形式:
4-d的回歸量 map到原始的建議框bounding box proposal:
對(duì)于ground truth bounding box ,我們的loss是要使更接近gt,所以其4-d的loss為:
對(duì)于respoint形式
是預(yù)測(cè)點(diǎn)的offset.
所以我們只需要學(xué)習(xí)其offset,然后加到原始點(diǎn)坐標(biāo)即可.
RPDet:anchor free的respoint 檢測(cè)器
其流程如下圖所示:
- 使用中心點(diǎn)作為對(duì)象的初始表示.
- 基于中心點(diǎn),通過deformable convolution 來學(xué)習(xí)每個(gè)中心點(diǎn)的偏移量,如9個(gè)點(diǎn)偏移量來表示物體,則用一個(gè)3 X 3的可變形卷積.然后利用偏移量對(duì)物體位置進(jìn)行回歸.
- 經(jīng)過兩次deformable convolution的offset偏移量回歸矯正,形成respints object
其RPDet的head主要算法結(jié)構(gòu)如圖所示:
其中l(wèi)ocate subnet 與class subnet兩個(gè)子網(wǎng)絡(luò)的輸入都是通過rpn主干網(wǎng)絡(luò)抽取的相同圖像特征.
我們看到通過center point生成respoint的奧秘在于locate subnet中那個(gè) 3 X 3 的可變形卷積自動(dòng)學(xué)習(xí)得到的關(guān)于物體的感受野位置
respoint 生成bbox的三種方法:
- Min-max function.在RepPoints上執(zhí)行兩個(gè)軸上的Min-max操作以確定Bp,等效于所有采樣點(diǎn)上的邊界框值.
- Partial min-max function.在兩個(gè)軸上分別對(duì)樣本點(diǎn)的子集進(jìn)行最小-最大運(yùn)算向图,以獲得矩形框值.
- Moment-based function.RepPoints的平均值和標(biāo)準(zhǔn)偏差用于計(jì)算矩形框Bp的中心點(diǎn)和比例,其中比例與全球共享的可學(xué)習(xí)乘數(shù)λx和λy相乘标沪。(代碼中默認(rèn)使用這種方式)
loss的計(jì)算:
- location loss:先將respoint轉(zhuǎn)換為偽框(pseudo box),然后計(jì)算pseudo box與ground- truth bounding box的loss.(論文中使用左上角與右下角之間的smooth l1 loss來得到location loss)
- classification loss:采用FocalLoss的形式來解決類別不平衡問題
代碼分析
RPDet的代碼在https://github.com/microsoft/RepPoints.已合并如mmdetion框架中,我們來看mmdetion中的代碼:
config文件:
config/reppoints/reppoints_moment_r50_fpn_1x.py
#model定義
model = dict(
type='RepPointsDetector',
pretrained='torchvision://resnet50',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
style='pytorch'),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
start_level=1,
add_extra_convs=True,
num_outs=5,
norm_cfg=norm_cfg),
bbox_head=dict(
type='RepPointsHead',
num_classes=81,
in_channels=256,
feat_channels=256,
point_feat_channels=256,
stacked_convs=3,
num_points=9,
gradient_mul=0.1,
point_strides=[8, 16, 32, 64, 128],
point_base_scale=4,
norm_cfg=norm_cfg,
loss_cls=dict(
type='FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox_init=dict(type='SmoothL1Loss', beta=0.11, loss_weight=0.5),
loss_bbox_refine=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0),
transform_method='moment'))
其主干網(wǎng)絡(luò)采用restnet+fpn的形式,正常的多尺度抽取圖像特征;
下面我們結(jié)合reppoint-head的結(jié)構(gòu)圖,來看兩個(gè)subnet是如何發(fā)揮作用的:
mmdet/models/anchor_heads/resppoints_head.py
@HEADS.register_module
class RepPointsHead(nn.Module):
def __init__(self,****)
#部分省略初始化定義
# we use deformable conv to extract points features
#dcn的kernel大小即為定義點(diǎn)的數(shù)量,即用一個(gè)dcn的感受野來表示物體輪廓
self.dcn_kernel = int(np.sqrt(num_points))
self.dcn_pad = int((self.dcn_kernel - 1) / 2)
assert self.dcn_kernel * self.dcn_kernel == num_points, \
"The points number should be a square number."
assert self.dcn_kernel % 2 == 1, \
"The points number should be an odd square number."
#可變形卷積的初始化x,y偏移量
dcn_base = np.arange(-self.dcn_pad,
self.dcn_pad + 1).astype(np.float64)
dcn_base_y = np.repeat(dcn_base, self.dcn_kernel)
dcn_base_x = np.tile(dcn_base, self.dcn_kernel)
dcn_base_offset = np.stack([dcn_base_y, dcn_base_x], axis=1).reshape(
(-1))
self.dcn_base_offset = torch.tensor(dcn_base_offset).view(1, -1, 1, 1)
self._init_layers()
def _init_layers(self):
self.relu = nn.ReLU(inplace=True)
self.cls_convs = nn.ModuleList()
self.reg_convs = nn.ModuleList()
#兩個(gè)subnet分別都有3個(gè)3X3的卷積進(jìn)行特征抽取工作
for i in range(self.stacked_convs):
chn = self.in_channels if i == 0 else self.feat_channels
self.cls_convs.append(
ConvModule(
chn,
self.feat_channels,
3,
stride=1,
padding=1,
conv_cfg=self.conv_cfg,
norm_cfg=self.norm_cfg))
self.reg_convs.append(
ConvModule(
chn,
self.feat_channels,
3,
stride=1,
padding=1,
conv_cfg=self.conv_cfg,
norm_cfg=self.norm_cfg))
#respoint利用dcn進(jìn)行offset學(xué)習(xí)部分網(wǎng)絡(luò)定義
pts_out_dim = 4 if self.use_grid_points else 2 * self.num_points
self.reppoints_cls_conv = DeformConv(self.feat_channels,
self.point_feat_channels,
self.dcn_kernel, 1, self.dcn_pad)
self.reppoints_cls_out = nn.Conv2d(self.point_feat_channels,
self.cls_out_channels, 1, 1, 0)
self.reppoints_pts_init_conv = nn.Conv2d(self.feat_channels,
self.point_feat_channels, 3,
1, 1)
self.reppoints_pts_init_out = nn.Conv2d(self.point_feat_channels,
pts_out_dim, 1, 1, 0)
self.reppoints_pts_refine_conv = DeformConv(self.feat_channels,
self.point_feat_channels,
self.dcn_kernel, 1,
self.dcn_pad)
self.reppoints_pts_refine_out = nn.Conv2d(self.point_feat_channels,
pts_out_dim, 1, 1, 0)
#網(wǎng)絡(luò)前饋計(jì)算
def forward_single(self, x):
dcn_base_offset = self.dcn_base_offset.type_as(x)
# If we use center_init, the initial reppoints is from center points.
# If we use bounding bbox representation, the initial reppoints is
# from regular grid placed on a pre-defined bbox.
if self.use_grid_points or not self.center_init:
scale = self.point_base_scale / 2
points_init = dcn_base_offset / dcn_base_offset.max() * scale
bbox_init = x.new_tensor([-scale, -scale, scale,
scale]).view(1, 4, 1, 1)
else:
points_init = 0
cls_feat = x
pts_feat = x
for cls_conv in self.cls_convs:
cls_feat = cls_conv(cls_feat)
for reg_conv in self.reg_convs:
pts_feat = reg_conv(pts_feat)
# initialize reppoints
pts_out_init = self.reppoints_pts_init_out(
self.relu(self.reppoints_pts_init_conv(pts_feat)))
if self.use_grid_points:
pts_out_init, bbox_out_init = self.gen_grid_from_reg(
pts_out_init, bbox_init.detach())
else:
pts_out_init = pts_out_init + points_init
# refine and classify reppoints
pts_out_init_grad_mul = (1 - self.gradient_mul) * pts_out_init.detach(
) + self.gradient_mul * pts_out_init
dcn_offset = pts_out_init_grad_mul - dcn_base_offset
cls_out = self.reppoints_cls_out(
self.relu(self.reppoints_cls_conv(cls_feat, dcn_offset)))
pts_out_refine = self.reppoints_pts_refine_out(
self.relu(self.reppoints_pts_refine_conv(pts_feat, dcn_offset)))
if self.use_grid_points:
pts_out_refine, bbox_out_refine = self.gen_grid_from_reg(
pts_out_refine, bbox_out_init.detach())
else:
pts_out_refine = pts_out_refine + pts_out_init.detach()
return cls_out, pts_out_init, pts_out_refine
總結(jié)與tips
這篇論文在我的理解中,更像是將可形變卷積應(yīng)用在了目標(biāo)檢測(cè)領(lǐng)域,通過定位和分類的監(jiān)督loss來監(jiān)督可形變卷積對(duì)于物體偏移量的學(xué)習(xí),使得卷積的學(xué)習(xí)變得可解釋性.啟發(fā)我們可以可以用不同的監(jiān)督信息來使用可形變卷積.
respoint 如何解決同一位置多個(gè)物體的遮擋問題:
In RPDet, we show that this issue can be greatly alleviated by using the FPN structure [24] for the following reasons: first, objects of different scales will be assigned to different image feature levels, which addresses objects of different scales and the same center points locations; second, FPN has a high-resolution feature map for small objects, which also reduces the chance of two objects having centers located at the same feature position.
作者認(rèn)為通過rpn結(jié)構(gòu)將不同比例對(duì)象分配給不同的圖像特征的方式來解決;
但這種方式能放解決像行人檢測(cè)中多個(gè)行人遮擋問題還有待商榷.