目標(biāo)檢測(cè)算法之SSD代碼解析(萬字長(zhǎng)文超詳細(xì))

前言

前面的推文已經(jīng)介紹過SSD算法，我覺得原理說的還算清楚了件舵，但是一個(gè)算法不深入到代碼去理解是完全不夠的往声。因此本篇文章是在上篇SSD算法原理解析的基礎(chǔ)上做的代碼解析埂奈，解析SSD算法原理的推文的地址如下：https://mp.weixin.qq.com/s/lXqobT45S1wz-evc7KO5DA。今天要解析的SSD源碼來自于github一個(gè)非衬斡Γ火的Pytorch實(shí)現(xiàn)澜掩，已經(jīng)有3K+星，地址為：https://github.com/amdegroot/ssd.pytorch/

網(wǎng)絡(luò)結(jié)構(gòu)

為了比較好的對(duì)應(yīng)SSD的結(jié)構(gòu)來看代碼杖挣，我們首先放出SSD的網(wǎng)絡(luò)結(jié)構(gòu)肩榕，如下圖所示：

在這里插入圖片描述

可以看到原始的SSD網(wǎng)絡(luò)是以VGG-16作Backbone（骨干網(wǎng)絡(luò)）的。為了更加清晰看到相比于VGG16惩妇，SSD的網(wǎng)絡(luò)使用了哪些變化株汉，知乎上的一個(gè)帖子做了一個(gè)非常清晰的圖，這里借用一下歌殃，原圖地址為：https://zhuanlan.zhihu.com/p/79854543 乔妈。帶有特征圖維度信息的更清晰的骨干網(wǎng)絡(luò)和VGG16的對(duì)比圖如下：

在這里插入圖片描述

源碼解析

OK，現(xiàn)在我們就要開始從源碼剖析SSD了氓皱。主要弄清楚三個(gè)方面路召，網(wǎng)絡(luò)結(jié)構(gòu)的搭建，Anchor還有損失函數(shù)波材，就算是理解這個(gè)源碼了股淡。

網(wǎng)絡(luò)搭建

從上面的圖中我們可以清晰的看到在以VGG16做骨干網(wǎng)絡(luò)時(shí)，在conv5后丟棄了CGG16中的全連接層改為了 $1024\times 3\times 3$ 和 $1024\times1\times1$ 的卷積層廷区。其中conv4-1卷積層前面的maxpooling層的ceil_model=True唯灵，使得輸出特征圖長(zhǎng)寬為 $38\times 38$ 。還有conv5-3后面的一層maxpooling層參數(shù)為 $(kernelsize=3,stride=1,padding=1)$ 隙轻，不進(jìn)行下采樣早敬。然后在fc7后面接上多尺度提取的另外4個(gè)卷積層就構(gòu)成了完整的SSD網(wǎng)絡(luò)。這里VGG16修改后的代碼如下大脉，來自ssd.py：

def vgg(cfg, i, batch_norm=False):
    layers = []
    in_channels = i
    for v in cfg:
        if v == 'M':
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        elif v == 'C':
            layers += [nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True)]
        else:
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
            if batch_norm:
                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
            else:
                layers += [conv2d, nn.ReLU(inplace=True)]
            in_channels = v
    pool5 = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)
    conv6 = nn.Conv2d(512, 1024, kernel_size=3, padding=6, dilation=6)
    conv7 = nn.Conv2d(1024, 1024, kernel_size=1)
    layers += [pool5, conv6,
               nn.ReLU(inplace=True), conv7, nn.ReLU(inplace=True)]
    return layers

可以看到和我們上面的那張圖是完全一致的搞监。代碼里面最后獲得的conv7就是我們上面圖里面的fc7，特征維度是： $[None,1024,19,19]$ 镰矿。
現(xiàn)在可以開始搭建SSD網(wǎng)絡(luò)后面的多尺度提取網(wǎng)絡(luò)了琐驴。也就是網(wǎng)絡(luò)結(jié)構(gòu)圖中的Extra Feature Layers。我們從開篇的結(jié)構(gòu)圖中截取一下這一部分，方便我們對(duì)照代碼绝淡。

在這里插入圖片描述

實(shí)現(xiàn)的代碼如下（同樣來自ssd.py）：

def add_extras(cfg, i, batch_norm=False):
    # Extra layers added to VGG for feature scaling
    layers = []
    in_channels = i
    flag = False #flag 用來控制 kernel_size= 1 or 3
    for k, v in enumerate(cfg):
        if in_channels != 'S':
            if v == 'S':
                layers += [nn.Conv2d(in_channels, cfg[k + 1],
                           kernel_size=(1, 3)[flag], stride=2, padding=1)]
            else:
                layers += [nn.Conv2d(in_channels, v, kernel_size=(1, 3)[flag])]
            flag = not flag
        in_channels = v
return layers

可以看到網(wǎng)絡(luò)結(jié)構(gòu)中除了魔改后的VGG16和Extra Layers還有6個(gè)橫著的線宙刘，這代表的是對(duì)6個(gè)尺度的特征圖進(jìn)行卷積獲得預(yù)測(cè)框的回歸(loc)和類別(cls)信息，注意SSD將背景也看成類別了牢酵，所以對(duì)于VOC數(shù)據(jù)集類別數(shù)就是20+1=21悬包。這部分的代碼為：

def multibox(vgg, extra_layers, cfg, num_classes):
    loc_layers = []#多尺度分支的回歸網(wǎng)絡(luò)
    conf_layers = []#多尺度分支的分類網(wǎng)絡(luò)
    # 第一部分，vgg 網(wǎng)絡(luò)的 Conv2d-4_3(21層)馍乙， Conv2d-7_1(-2層)
    vgg_source = [21, -2]
    for k, v in enumerate(vgg_source):
        # 回歸 box*4(坐標(biāo))
        loc_layers += [nn.Conv2d(vgg[v].out_channels,
                                 cfg[k] * 4, kernel_size=3, padding=1)]
        # 置信度 box*(num_classes)
        conf_layers += [nn.Conv2d(vgg[v].out_channels,
                        cfg[k] * num_classes, kernel_size=3, padding=1)]
    # 第二部分布近，cfg從第三個(gè)開始作為box的個(gè)數(shù)，而且用于多尺度提取的網(wǎng)絡(luò)分別為1,3,5,7層
    for k, v in enumerate(extra_layers[1::2], 2):
        loc_layers += [nn.Conv2d(v.out_channels, cfg[k]
                                 * 4, kernel_size=3, padding=1)]
        conf_layers += [nn.Conv2d(v.out_channels, cfg[k]
                                  * num_classes, kernel_size=3, padding=1)]
    return vgg, extra_layers, (loc_layers, conf_layers)
# 用下面的測(cè)試代碼測(cè)試一下
if __name__  == "__main__":
    vgg, extra_layers, (l, c) = multibox(vgg(base['300'], 3),
                                         add_extras(extras['300'], 1024),
                                         [4, 6, 6, 6, 4, 4], 21)
    print(nn.Sequential(*l))
    print('---------------------------')
    print(nn.Sequential(*c))

在jupter notebook輸出信息為：

'''
loc layers: 
'''
Sequential(
  (0): Conv2d(512, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): Conv2d(1024, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (2): Conv2d(512, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): Conv2d(256, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (4): Conv2d(256, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (5): Conv2d(256, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
---------------------------
'''
conf layers: 
''' 
Sequential(
  (0): Conv2d(512, 84, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): Conv2d(1024, 126, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (2): Conv2d(512, 126, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): Conv2d(256, 126, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (4): Conv2d(256, 84, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (5): Conv2d(256, 84, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)

Anchor生成(Prior_Box層)

這個(gè)在前面SSD的原理篇中講過了丝格，這里不妨再回憶一下撑瞧，SSD從魔改后的VGG16的conv4_3開始一共使用了6個(gè)不同大小的特征圖，大小分別為(38,28),(19,19),(10,10),(5,5),(3,3),(1,1)显蝌，但每個(gè)特征圖上設(shè)置的先驗(yàn)框(Anchor)的數(shù)量不同预伺。先驗(yàn)框的設(shè)置包含尺度和長(zhǎng)寬比兩個(gè)方面。對(duì)于先驗(yàn)框的設(shè)置曼尊，公式如下：
$s_k=s_{min}+\frac{s_{max}-s_{min}}{m-1}(k-1),k\in [1,m]$ 酬诀，其中 $M$ 指的是特征圖個(gè)數(shù)，這里為5骆撇，因?yàn)榈谝粚?code>conv4_3的Anchor是單獨(dú)設(shè)置的料滥， $s_k$ 代表先驗(yàn)框大小相對(duì)于特征圖的比例，注意這里不是相對(duì)原圖哦艾船。最后葵腹， $s_{min}$ 和 $s_{max}$ 表示比例的最小值和最大值，論文中分別取 $0.2$ 和 $0.9$ 屿岂。
對(duì)于第一個(gè)特征圖践宴，它的先驗(yàn)框尺度比例設(shè)置為 $s_{min}/2=0.1$ ，則他的尺度為 $300\times 0.1=30$ 爷怀，后面的特征圖帶入公式計(jì)算阻肩，并將其映射會(huì)原圖300的大小可以得到，剩下的5個(gè)特征圖的尺度 $s_k$ 為 ${60,111,162,213,264}$ 运授。所以綜合起來烤惊，6個(gè)特征圖的尺度 $s_k$ 為 ${30,60,111,162,213,264}$ 。有了Anchor的尺度吁朦，接下來設(shè)置Anchor的長(zhǎng)寬柒室，論文中長(zhǎng)寬設(shè)置一般為 $a_r={1,2,3,\frac{1}{2},\frac{1}{3}}$ ，根據(jù)面積和長(zhǎng)寬比可以得到先驗(yàn)框的寬度和高度：
$w_k^a=s_k\sqrt{a_r},h_k^a=s_k/\sqrt{a_r}$ 逗宜。
這里有一些值得注意的點(diǎn)雄右，如下：

上面的 $s_k$ 是相對(duì)于原圖的大小空骚。
默認(rèn)情況下千元，每個(gè)特征圖除了上面5個(gè)比例的Anchor圈盔，還會(huì)設(shè)置一個(gè)尺度為 $s_k^{'}=\sqrt{s_ks_{k+1}}$ 且 $a_r=1$ 的先驗(yàn)框，這樣每個(gè)特征圖都設(shè)置了兩個(gè)長(zhǎng)寬比為1但大小不同的正方形先驗(yàn)框付魔。最后一個(gè)特征圖需要參考一下 $s_{m+1}=315$ 來計(jì)算 $s_m$ 逢渔。
在實(shí)現(xiàn)conv4_3,conv10_2,conv11_2層時(shí)僅使用4個(gè)先驗(yàn)框肋坚，不使用長(zhǎng)寬比為 $3,\frac{1}{3}$ 的Anchor。
每個(gè)單元的先驗(yàn)框中心點(diǎn)分布在每個(gè)單元的中心肃廓，即：
$[\frac{i+0.5}{|f_k|},\frac{j+0.5}{|f_k|}],i,j\in[0,|f_k|]$ 智厌，其中 $f_k$ 是特征圖的大小。

從Anchor的值來看亿昏，越前面的特征圖Anchor的尺寸越小，也就是說對(duì)小目標(biāo)的效果越好档礁。先驗(yàn)框的總數(shù)為num_priors = 38x38x4+19x19x6+10x10x6+5x5x6+3x3x4+1x1x4=8732角钩。

生成先驗(yàn)框的代碼如下（來自layers/functions/prior_box.py）

class PriorBox(object):
    """Compute priorbox coordinates in center-offset form for each source
    feature map.
    """
    def __init__(self, cfg):
        super(PriorBox, self).__init__()
        self.image_size = cfg['min_dim']
        # number of priors for feature map location (either 4 or 6)
        self.num_priors = len(cfg['aspect_ratios'])
        self.variance = cfg['variance'] or [0.1]
        self.feature_maps = cfg['feature_maps']
        self.min_sizes = cfg['min_sizes']
        self.max_sizes = cfg['max_sizes']
        self.steps = cfg['steps']
        self.aspect_ratios = cfg['aspect_ratios']
        self.clip = cfg['clip']
        self.version = cfg['name']
        for v in self.variance:
            if v <= 0:
                raise ValueError('Variances must be greater than 0')

    def forward(self):
        mean = []
        # 遍歷多尺度的 特征圖: [38, 19, 10, 5, 3, 1]
        for k, f in enumerate(self.feature_maps):
            # 遍歷每個(gè)像素
            for i, j in product(range(f), repeat=2):
                # k-th 層的feature map 大小
                f_k = self.image_size / self.steps[k]
                # # 每個(gè)框的中心坐標(biāo)
                cx = (j + 0.5) / f_k
                cy = (i + 0.5) / f_k

                # aspect_ratio: 1 當(dāng) ratio==1的時(shí)候，會(huì)產(chǎn)生兩個(gè) box
                # r==1, size = s_k呻澜， 正方形
                s_k = self.min_sizes[k]/self.image_size
                mean += [cx, cy, s_k, s_k]

                # r==1, size = sqrt(s_k * s_(k+1)), 正方形
                # rel size: sqrt(s_k * s_(k+1))
                s_k_prime = sqrt(s_k * (self.max_sizes[k]/self.image_size))
                mean += [cx, cy, s_k_prime, s_k_prime]

                # 當(dāng) ratio != 1 的時(shí)候递礼，產(chǎn)生的box為矩形
                for ar in self.aspect_ratios[k]:
                    mean += [cx, cy, s_k*sqrt(ar), s_k/sqrt(ar)]
                    mean += [cx, cy, s_k/sqrt(ar), s_k*sqrt(ar)]
        # 轉(zhuǎn)化為 torch的Tensor
        output = torch.Tensor(mean).view(-1, 4)
        #歸一化，把輸出設(shè)置在 [0,1]
        if self.clip:
            output.clamp_(max=1, min=0)
return output

網(wǎng)絡(luò)結(jié)構(gòu)

結(jié)合了前面介紹的魔改后的VGG16羹幸，還有Extra Layers脊髓，還有生成Anchor的Priobox策略，我們可以寫出SSD的整體結(jié)構(gòu)如下（代碼在ssd.py）：

class SSD(nn.Module):
    """Single Shot Multibox Architecture
    The network is composed of a base VGG network followed by the
    added multibox conv layers.  Each multibox layer branches into
        1) conv2d for class conf scores
        2) conv2d for localization predictions
        3) associated priorbox layer to produce default bounding
           boxes specific to the layer's feature map size.
    See: https://arxiv.org/pdf/1512.02325.pdf for more details.
    Args:
        phase: (string) Can be "test" or "train"
        size: input image size
        base: VGG16 layers for input, size of either 300 or 500
        extras: extra layers that feed to multibox loc and conf layers
        head: "multibox head" consists of loc and conf conv layers
    """

    def __init__(self, phase, size, base, extras, head, num_classes):
        super(SSD, self).__init__()
        self.phase = phase
        self.num_classes = num_classes
        # 配置config
        self.cfg = (coco, voc)[num_classes == 21]
        # 初始化先驗(yàn)框
        self.priorbox = PriorBox(self.cfg)
        self.priors = Variable(self.priorbox.forward(), volatile=True)
        self.size = size

        # SSD network
        # backbone網(wǎng)絡(luò)
        self.vgg = nn.ModuleList(base)
        # Layer learns to scale the l2 normalized features from conv4_3
        # conv4_3后面的網(wǎng)絡(luò)栅受，L2 正則化
        self.L2Norm = L2Norm(512, 20)
        self.extras = nn.ModuleList(extras)
        # 回歸和分類網(wǎng)絡(luò)
        self.loc = nn.ModuleList(head[0])
        self.conf = nn.ModuleList(head[1])

        if phase == 'test':
            self.softmax = nn.Softmax(dim=-1)
            self.detect = Detect(num_classes, 0, 200, 0.01, 0.45)

    def forward(self, x):
        """Applies network layers and ops on input image(s) x.
        Args:
            x: input image or batch of images. Shape: [batch,3,300,300].
        Return:
            Depending on phase:
            test:
                Variable(tensor) of output class label predictions,
                confidence score, and corresponding location predictions for
                each object detected. Shape: [batch,topk,7]
            train:
                list of concat outputs from:
                    1: confidence layers, Shape: [batch*num_priors,num_classes]
                    2: localization layers, Shape: [batch,num_priors*4]
                    3: priorbox layers, Shape: [2,num_priors*4]
        """
        sources = list()
        loc = list()
        conf = list()

        # apply vgg up to conv4_3 relu
        # vgg網(wǎng)絡(luò)到conv4_3
        for k in range(23):
            x = self.vgg[k](x)
        # l2 正則化
        s = self.L2Norm(x)
        sources.append(s)

        # apply vgg up to fc7
        # conv4_3 到 fc
        for k in range(23, len(self.vgg)):
            x = self.vgg[k](x)
        sources.append(x)

        # apply extra layers and cache source layer outputs
        # extras 網(wǎng)絡(luò)
        for k, v in enumerate(self.extras):
            x = F.relu(v(x), inplace=True)
            if k % 2 == 1:
                # 把需要進(jìn)行多尺度的網(wǎng)絡(luò)輸出存入 sources
                sources.append(x)

        # apply multibox head to source layers
        # 多尺度回歸和分類網(wǎng)絡(luò)
        for (x, l, c) in zip(sources, self.loc, self.conf):
            loc.append(l(x).permute(0, 2, 3, 1).contiguous())
            conf.append(c(x).permute(0, 2, 3, 1).contiguous())

        loc = torch.cat([o.view(o.size(0), -1) for o in loc], 1)
        conf = torch.cat([o.view(o.size(0), -1) for o in conf], 1)
        if self.phase == "test":
            output = self.detect(
                loc.view(loc.size(0), -1, 4),                   # loc preds
                self.softmax(conf.view(conf.size(0), -1,
                             self.num_classes)),                # conf preds
                self.priors.type(type(x.data))                  # default boxes
            )
        else:
            output = (
                # loc的輸出将硝，size:(batch, 8732, 4)
                loc.view(loc.size(0), -1, 4),
                # conf的輸出，size:(batch, 8732, 21)
                conf.view(conf.size(0), -1, self.num_classes),
                # 生成所有的候選框 size([8732, 4])
                self.priors
            )
        return output
    # 加載模型參數(shù)
    def load_weights(self, base_file):
        other, ext = os.path.splitext(base_file)
        if ext == '.pkl' or '.pth':
            print('Loading weights into state dict...')
            self.load_state_dict(torch.load(base_file,
                                 map_location=lambda storage, loc: storage))
            print('Finished!')
        else:
            print('Sorry only .pth and .pkl files supported.')

然后為了增加可讀性屏镊，重新封裝了一下依疼，代碼如下：

base = {
    '300': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'C', 512, 512, 512, 'M',
            512, 512, 512],
    '512': [],
}
extras = {
    '300': [256, 'S', 512, 128, 'S', 256, 128, 256, 128, 256],
    '512': [],
}
mbox = {
    '300': [4, 6, 6, 6, 4, 4],  # number of boxes per feature map location
    '512': [],
}


def build_ssd(phase, size=300, num_classes=21):
    if phase != "test" and phase != "train":
        print("ERROR: Phase: " + phase + " not recognized")
        return
    if size != 300:
        print("ERROR: You specified size " + repr(size) + ". However, " +
              "currently only SSD300 (size=300) is supported!")
        return
    # 調(diào)用multibox，生成vgg,extras,head
    base_, extras_, head_ = multibox(vgg(base[str(size)], 3),
                                     add_extras(extras[str(size)], 1024),
                                     mbox[str(size)], num_classes)
    return SSD(phase, size, base_, extras_, head_, num_classes)

Loss解析

SSD的損失函數(shù)包含兩個(gè)部分而芥，一個(gè)是定位損失 $L_{loc}$ 律罢，一個(gè)是分類損失 $L_{conf}$ ，整個(gè)損失函數(shù)表達(dá)如下：
$L(x,c,l,g)=\frac{1}{N}(L_{conf}(x,c)+\alpha L_{loc}(x,l,g))$
其中棍丐， $N$ 是先驗(yàn)框的正樣本數(shù)量误辑， $c$ 是類別置信度預(yù)測(cè)值， $l$ 是先驗(yàn)框?qū)?yīng)的邊界框預(yù)測(cè)值歌逢， $g$ 是ground truth的位置參數(shù)巾钉， $x$ 代表網(wǎng)絡(luò)的預(yù)測(cè)值。對(duì)于位置損失秘案，采用Smooth L1 Loss睛琳，位置信息都是encode之后的數(shù)值盒蟆，后面會(huì)講這個(gè)encode的過程。而對(duì)于分類損失师骗，首先需要使用hard negtive mining將正負(fù)樣本按照1:3 的比例把負(fù)樣本抽樣出來历等，抽樣的方法是：針對(duì)所有batch的confidence，按照置信度誤差進(jìn)行降序排列辟癌，取出前top_k個(gè)負(fù)樣本寒屯。損失函數(shù)可以用下圖表示：

在這里插入圖片描述

實(shí)現(xiàn)步驟

Reshape所有batch中的conf，即代碼中的batch_conf = conf_data.view(-1, self.num_classes)黍少，方便后續(xù)排序寡夹。
置信度誤差越大，實(shí)際上就是預(yù)測(cè)背景的置信度越小厂置。
把所有conf進(jìn)行logsoftmax處理(均為負(fù)值)菩掏，預(yù)測(cè)的置信度越小，則logsoftmax越小昵济，取絕對(duì)值智绸，則|logsoftmax|越大，降序排列-logsoftmax访忿，取前top_k的負(fù)樣本瞧栗。
其中，log_sum_exp函數(shù)的代碼如下：

def log_sum_exp(x):
    x_max = x.detach().max()
    return torch.log(torch.sum(torch.exp(x-x_max), 1, keepdim=True))+x_max

分類損失conf_logP函數(shù)如下：

conf_logP = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1, 1))

這樣計(jì)算的原因主要是為了增強(qiáng)logsoftmax損失的數(shù)值穩(wěn)定性海铆。放一張我的手推圖：

在這里插入圖片描述

損失函數(shù)完整代碼實(shí)現(xiàn)迹恐，來自layers/modules/multibox_loss.py：

class MultiBoxLoss(nn.Module):
    """SSD Weighted Loss Function
    Compute Targets:
        1) Produce Confidence Target Indices by matching  ground truth boxes
           with (default) 'priorboxes' that have jaccard index > threshold parameter
           (default threshold: 0.5).
        2) Produce localization target by 'encoding' variance into offsets of ground
           truth boxes and their matched  'priorboxes'.
        3) Hard negative mining to filter the excessive number of negative examples
           that comes with using a large number of default bounding boxes.
           (default negative:positive ratio 3:1)
    Objective Loss:
        L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N
        Where, Lconf is the CrossEntropy Loss and Lloc is the SmoothL1 Loss
        weighted by α which is set to 1 by cross val.
        Args:
            c: class confidences,
            l: predicted boxes,
            g: ground truth boxes
            N: number of matched default boxes
        See: https://arxiv.org/pdf/1512.02325.pdf for more details.
    """

    def __init__(self, num_classes, overlap_thresh, prior_for_matching,
                 bkg_label, neg_mining, neg_pos, neg_overlap, encode_target,
                 use_gpu=True):
        super(MultiBoxLoss, self).__init__()
        self.use_gpu = use_gpu
        self.num_classes = num_classes
        self.threshold = overlap_thresh
        self.background_label = bkg_label
        self.encode_target = encode_target
        self.use_prior_for_matching = prior_for_matching
        self.do_neg_mining = neg_mining
        self.negpos_ratio = neg_pos
        self.neg_overlap = neg_overlap
        self.variance = cfg['variance']

    def forward(self, predictions, targets):
        """Multibox Loss
        Args:
            predictions (tuple): A tuple containing loc preds, conf preds,
            and prior boxes from SSD net.
                conf shape: torch.size(batch_size,num_priors,num_classes)
                loc shape: torch.size(batch_size,num_priors,4)
                priors shape: torch.size(num_priors,4)
            targets (tensor): Ground truth boxes and labels for a batch,
                shape: [batch_size,num_objs,5] (last idx is the label).
        """
        loc_data, conf_data, priors = predictions
        num = loc_data.size(0)# batch_size
        priors = priors[:loc_data.size(1), :]
        num_priors = (priors.size(0)) # 先驗(yàn)框個(gè)數(shù)
        num_classes = self.num_classes #類別數(shù)

        # match priors (default boxes) and ground truth boxes
        # 獲取匹配每個(gè)prior box的 ground truth
        # 創(chuàng)建 loc_t 和 conf_t 保存真實(shí)box的位置和類別
        loc_t = torch.Tensor(num, num_priors, 4)
        conf_t = torch.LongTensor(num, num_priors)
        for idx in range(num):
            truths = targets[idx][:, :-1].data #ground truth box信息
            labels = targets[idx][:, -1].data # ground truth conf信息
            defaults = priors.data # priors的 box 信息
            # 匹配 ground truth
            match(self.threshold, truths, defaults, self.variance, labels,
                  loc_t, conf_t, idx)
        if self.use_gpu:
            loc_t = loc_t.cuda()
            conf_t = conf_t.cuda()
        # wrap targets
        loc_t = Variable(loc_t, requires_grad=False)
        conf_t = Variable(conf_t, requires_grad=False)
        # 匹配中所有的正樣本mask,shape[b,M]
        pos = conf_t > 0
        num_pos = pos.sum(dim=1, keepdim=True)
        # Localization Loss,使用 Smooth L1
        # shape[b,M]-->shape[b,M,4]
        pos_idx = pos.unsqueeze(pos.dim()).expand_as(loc_data)
        loc_p = loc_data[pos_idx].view(-1, 4) #預(yù)測(cè)的正樣本box信息
        loc_t = loc_t[pos_idx].view(-1, 4) #真實(shí)的正樣本box信息
        loss_l = F.smooth_l1_loss(loc_p, loc_t, size_average=False) #Smooth L1 損失
        
        '''
        Target；
            下面進(jìn)行hard negative mining
        過程:
            1卧斟、 針對(duì)所有batch的conf殴边，按照置信度誤差(預(yù)測(cè)背景的置信度越小，誤差越大)進(jìn)行降序排列;
            2珍语、 負(fù)樣本的label全是背景找都，那么利用log softmax 計(jì)算出logP,
               logP越大，則背景概率越低,誤差越大;
            3廊酣、 選取誤差交大的top_k作為負(fù)樣本能耻，保證正負(fù)樣本比例接近1:3;
        '''
        # Compute max conf across batch for hard negative mining
        # shape[b*M,num_classes]
        batch_conf = conf_data.view(-1, self.num_classes)
        # 使用logsoftmax，計(jì)算置信度,shape[b*M, 1]
        loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1, 1))

        # Hard Negative Mining
        loss_c[pos] = 0  # 把正樣本排除亡驰，剩下的就全是負(fù)樣本晓猛，可以進(jìn)行抽樣
        loss_c = loss_c.view(num, -1)# shape[b, M]
        # 兩次sort排序，能夠得到每個(gè)元素在降序排列中的位置idx_rank
        _, loss_idx = loss_c.sort(1, descending=True)
        _, idx_rank = loss_idx.sort(1)
         # 抽取負(fù)樣本
        # 每個(gè)batch中正樣本的數(shù)目凡辱，shape[b,1]
        num_pos = pos.long().sum(1, keepdim=True)
        num_neg = torch.clamp(self.negpos_ratio*num_pos, max=pos.size(1)-1)
        # 抽取前top_k個(gè)負(fù)樣本戒职，shape[b, M]
        neg = idx_rank < num_neg.expand_as(idx_rank)

        # Confidence Loss Including Positive and Negative Examples
        # shape[b,M] --> shape[b,M,num_classes]
        pos_idx = pos.unsqueeze(2).expand_as(conf_data)
        neg_idx = neg.unsqueeze(2).expand_as(conf_data)
        # 提取出所有篩選好的正負(fù)樣本(預(yù)測(cè)的和真實(shí)的)
        conf_p = conf_data[(pos_idx+neg_idx).gt(0)].view(-1, self.num_classes)
        targets_weighted = conf_t[(pos+neg).gt(0)]
        # 計(jì)算conf交叉熵
        loss_c = F.cross_entropy(conf_p, targets_weighted, size_average=False)

        # Sum of losses: L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N
        # 正樣本個(gè)數(shù)
        N = num_pos.data.sum()
        loss_l /= N
        loss_c /= N
        return loss_l, loss_c

先驗(yàn)框匹配策略

上面的代碼中還有一個(gè)地方?jīng)]講到，就是match函數(shù)透乾。這是SSD算法的先驗(yàn)框匹配函數(shù)洪燥。在訓(xùn)練時(shí)首先需要確定訓(xùn)練圖片中的ground truth是由哪一個(gè)先驗(yàn)框來匹配磕秤，與之匹配的先驗(yàn)框所對(duì)應(yīng)的邊界框?qū)⒇?fù)責(zé)預(yù)測(cè)它。SSD的先驗(yàn)框和ground truth匹配原則主要有2點(diǎn)捧韵。第一點(diǎn)是對(duì)于圖片中的每個(gè)ground truth市咆，找到和它IOU最大的先驗(yàn)框，該先驗(yàn)框與其匹配再来，這樣可以保證每個(gè)ground truth一定與某個(gè)prior匹配蒙兰。第二點(diǎn)是對(duì)于剩余的未匹配的先驗(yàn)框，若某個(gè)ground truth和它的IOU大于某個(gè)閾值(一般設(shè)為0.5)芒篷，那么改prior和這個(gè)ground truth搜变，剩下沒有匹配上的先驗(yàn)框都是負(fù)樣本（如果多個(gè)ground truth和某一個(gè)先驗(yàn)框的IOU均大于閾值，那么prior只與IOU最大的那個(gè)進(jìn)行匹配）针炉。代碼實(shí)現(xiàn)如下挠他，來自layers/box_utils.py：

def match(threshold, truths, priors, variances, labels, loc_t, conf_t, idx):
    """把和每個(gè)prior box 有最大的IOU的ground truth box進(jìn)行匹配，
    同時(shí)篡帕，編碼包圍框殖侵，返回匹配的索引，對(duì)應(yīng)的置信度和位置
    Args:
        threshold: IOU閾值赂苗，小于閾值設(shè)為背景
        truths: ground truth boxes, shape[N,4]
        priors: 先驗(yàn)框愉耙， shape[M,4]
        variances: prior的方差, list(float)
        labels: 圖片的所有類別贮尉，shape[num_obj]
        loc_t: 用于填充encoded loc 目標(biāo)張量
        conf_t: 用于填充encoded conf 目標(biāo)張量
        idx: 現(xiàn)在的batch index        
        The matched indices corresponding to 1)location and 2)confidence preds.
    """
    # jaccard index
    # 計(jì)算IOU
    overlaps = jaccard(
        truths,
        point_form(priors)
    )
    # (Bipartite Matching)
    # [1,num_objects] 和每個(gè)ground truth box 交集最大的 prior box
    best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True)
    # [1,num_priors] 和每個(gè)prior box 交集最大的 ground truth box
    best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True)
    best_truth_idx.squeeze_(0) #M
    best_truth_overlap.squeeze_(0) #M
    best_prior_idx.squeeze_(1) #N
    best_prior_overlap.squeeze_(1) #N
    # 保證每個(gè)ground truth box 與某一個(gè)prior box 匹配拌滋，固定值為 2 > threshold
    best_truth_overlap.index_fill_(0, best_prior_idx, 2)  # ensure best prior
    # TODO refactor: index  best_prior_idx with long tensor
    # ensure every gt matches with its prior of max overlap
    # 保證每一個(gè)ground truth 匹配它的都是具有最大IOU的prior
    # 根據(jù) best_prior_dix 鎖定 best_truth_idx里面的最大IOU prior
    for j in range(best_prior_idx.size(0)):
        best_truth_idx[best_prior_idx[j]] = j
    matches = truths[best_truth_idx]          # 提取出所有匹配的ground truth box, Shape: [M,4]    
    conf = labels[best_truth_idx] + 1         # 提取出所有GT框的類別， Shape:[M]   
    # 把 iou < threshold 的框類別設(shè)置為 bg,即為0
    conf[best_truth_overlap < threshold] = 0  # label as background
    # 編碼包圍框
    loc = encode(matches, priors, variances)
    # 保存匹配好的loc和conf到loc_t和conf_t中
    loc_t[idx] = loc    # [num_priors,4] encoded offsets to learn
    conf_t[idx] = conf  # [num_priors] top class label for each prior

位置坐標(biāo)轉(zhuǎn)換

我們看到上面出現(xiàn)了一個(gè)point_form函數(shù)猜谚，這是什么意思呢败砂？這是因?yàn)槟繕?biāo)框有2種表示方式：

$(x_{min},y_{min},x_{max},y_{max})$
$(x,y,w,h)$
這部分的代碼在layers/box_utils.py下：

def point_form(boxes):
    """ Convert prior_boxes to (xmin, ymin, xmax, ymax)
   把 prior_box (cx, cy, w, h)轉(zhuǎn)化為(xmin, ymin, xmax, ymax)
    """
    return torch.cat((boxes[:, :2] - boxes[:, 2:]/2,     # xmin, ymin
                     boxes[:, :2] + boxes[:, 2:]/2), 1)  # xmax, ymax


def center_size(boxes):
    """ Convert prior_boxes to (cx, cy, w, h)
    把 prior_box (xmin, ymin, xmax, ymax) 轉(zhuǎn)化為 (cx, cy, w, h)
    """
    return torch.cat((boxes[:, 2:] + boxes[:, :2])/2,  # cx, cy
                            boxes[:, 2:] - boxes[:, :2], 1) # w, h

IOU計(jì)算

這部分比較簡(jiǎn)單，對(duì)于兩個(gè)Box來講魏铅，首先計(jì)算兩個(gè)box左上角點(diǎn)坐標(biāo)的最大值和右下角坐標(biāo)的最小值昌犹，然后計(jì)算交集面積，最后把交集面積除以對(duì)應(yīng)的并集面積览芳。代碼仍在layers/box_utils.py：

def intersect(box_a, box_b):
    """ We resize both tensors to [A,B,2] without new malloc:
    [A,2] -> [A,1,2] -> [A,B,2]
    [B,2] -> [1,B,2] -> [A,B,2]
    Then we compute the area of intersect between box_a and box_b.
    Args:
      box_a: (tensor) bounding boxes, Shape: [A,4].
      box_b: (tensor) bounding boxes, Shape: [B,4].
    Return:
      (tensor) intersection area, Shape: [A,B].
    """
    A = box_a.size(0)
    B = box_b.size(0)
     # 右下角斜姥，選出最小值
    max_xy = torch.min(box_a[:, 2:].unsqueeze(1).expand(A, B, 2),
                       box_b[:, 2:].unsqueeze(0).expand(A, B, 2))
    # 左上角，選出最大值
    min_xy = torch.max(box_a[:, :2].unsqueeze(1).expand(A, B, 2),
                       box_b[:, :2].unsqueeze(0).expand(A, B, 2))
    # 負(fù)數(shù)用0截?cái)嗖拙梗瑸?代表交集為0
    inter = torch.clamp((max_xy - min_xy), min=0)
    return inter[:, :, 0] * inter[:, :, 1]


def jaccard(box_a, box_b):
    """Compute the jaccard overlap of two sets of boxes.  The jaccard overlap
    is simply the intersection over union of two boxes.  Here we operate on
    ground truth boxes and default boxes.
    E.g.:
        A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B)
    Args:
        box_a: (tensor) Ground truth bounding boxes, Shape: [num_objects,4]
        box_b: (tensor) Prior boxes from priorbox layers, Shape: [num_priors,4]
    Return:
        jaccard overlap: (tensor) Shape: [box_a.size(0), box_b.size(0)]
    """
    inter = intersect(box_a, box_b)# A∩B
     # box_a和box_b的面積
    area_a = ((box_a[:, 2]-box_a[:, 0]) *
              (box_a[:, 3]-box_a[:, 1])).unsqueeze(1).expand_as(inter)  # [A,B]#(N,)
    area_b = ((box_b[:, 2]-box_b[:, 0]) *
              (box_b[:, 3]-box_b[:, 1])).unsqueeze(0).expand_as(inter)  # [A,B]#(M,)
    union = area_a + area_b - inter
    return inter / union  # [A,B]

L2標(biāo)準(zhǔn)化

VGG16的conv4_3特征圖的大小為 $38\times 38$ 铸敏，網(wǎng)絡(luò)層靠前，方差比較大悟泵，需要加一個(gè)L2標(biāo)準(zhǔn)化杈笔，以保證和后面的檢測(cè)層差異不是很大。L2標(biāo)準(zhǔn)化的公式如下：
$\hat{x}=\frac{x}{||x||^2}$ 糕非，其中 $x=(x_1...x_d)||x||_2=(\sum_{i=1}^d|x_i|^2)^{1/2}$ 蒙具。同時(shí)球榆，這里還要注意的是如果簡(jiǎn)單的對(duì)一個(gè)layer的輸入進(jìn)行L2標(biāo)準(zhǔn)化就會(huì)改變?cè)搶拥囊?guī)模，并且會(huì)減慢學(xué)習(xí)速度禁筏，因此這里引入了一個(gè)縮放系數(shù) $\gamma_i$
持钉，對(duì)于每一個(gè)通道l2標(biāo)準(zhǔn)化后的結(jié)果為：
$y_i=\gamma_i\hat{x_i}$ ，通常 $scale$ 的值設(shè)10或者20融师，效果比較好右钾。代碼來自layers/modules/l2norm.py。

class L2Norm(nn.Module):
    '''
    conv4_3特征圖大小38x38旱爆，網(wǎng)絡(luò)層靠前舀射，norm較大，需要加一個(gè)L2 Normalization,以保證和后面的檢測(cè)層差異不是很大怀伦，具體可以參考： ParseNet脆烟。這個(gè)前面的推文里面有講。
    '''
    def __init__(self, n_channels, scale):
        super(L2Norm, self).__init__()
        self.n_channels = n_channels
        self.gamma = scale or None
        self.eps = 1e-10
        # 將一個(gè)不可訓(xùn)練的類型Tensor轉(zhuǎn)換成可以訓(xùn)練的類型 parameter
        self.weight = nn.Parameter(torch.Tensor(self.n_channels))
        self.reset_parameters()

    # 初始化參數(shù)    
    def reset_parameters(self):
        nn.init.constant_(self.weight, self.gamma)

    def forward(self, x):
        # 計(jì)算x的2范數(shù)
        norm = x.pow(2).sum(dim=1, keepdim=True).sqrt() # shape[b,1,38,38]
        x = x / norm   # shape[b,512,38,38]

        # 擴(kuò)展self.weight的維度為shape[1,512,1,1]房待，然后參考公式計(jì)算
        out = self.weight[None,...,None,None] * x
        return out

位置信息編解碼

上面提到了計(jì)算坐標(biāo)損失的時(shí)候邢羔，坐標(biāo)是encoding之后的，這是怎么回事呢桑孩？根據(jù)論文的描述拜鹤，預(yù)測(cè)框和ground truth邊界框存在一個(gè)轉(zhuǎn)換關(guān)系，先定義一些變量：

先驗(yàn)框位置： $d=(d^{cx},d^{cy},d^w,d^h)$
ground truth框位置： $g=(g^{cx},g^{cy},g^w,g^h)$
variance是先驗(yàn)框的坐標(biāo)方差流椒。
然后編碼的過程可以表示為：
$\hat{g_j^{cx}}=(g_j^{cx}-d_i^{cx})/d_i^w/varicance[0]$
$\hat{g_j^{cy}}=(g_j^{cy}-d_i^{cy})/d_i^h/varicance[1]$
$\hat{g_j^w}=log(\frac{g_j^w}{d_i^w})/variance[2]$
$\hat{g_j^h}=log(\frac{g_j^h}{d_i^h})/variance[3]$

解碼的過程可以表示為：
$g_{predict}^{cx}=d^w*(variance[0]*l^{cx})+d^{cx}$
$g_{predict}^{cy}=d^h*(variance[1]*l^{cy})+d^{cy}$
$g_{predict}^w=d^wexp(vairance[2]*l^w)$
$g_{predict}^h=d^hexp(vairance[3]*l^h)$

這部分對(duì)應(yīng)的代碼在layers/box_utils.py里面：

def encode(matched, priors, variances):
    """Encode the variances from the priorbox layers into the ground truth boxes
    we have matched (based on jaccard overlap) with the prior boxes.
    Args:
        matched: (tensor) Coords of ground truth for each prior in point-form
            Shape: [num_priors, 4].
        priors: (tensor) Prior boxes in center-offset form
            Shape: [num_priors,4].
        variances: (list[float]) Variances of priorboxes
    Return:
        encoded boxes (tensor), Shape: [num_priors, 4]
    """

    # dist b/t match center and prior's center
    g_cxcy = (matched[:, :2] + matched[:, 2:])/2 - priors[:, :2]
    # encode variance
    g_cxcy /= (variances[0] * priors[:, 2:])
    # match wh / prior wh
    g_wh = (matched[:, 2:] - matched[:, :2]) / priors[:, 2:]
    g_wh = torch.log(g_wh) / variances[1]
    # return target for smooth_l1_loss
    return torch.cat([g_cxcy, g_wh], 1)  # [num_priors,4]


# Adapted from https://github.com/Hakuyume/chainer-ssd
def decode(loc, priors, variances):
    """Decode locations from predictions using priors to undo
    the encoding we did for offset regression at train time.
    Args:
        loc (tensor): location predictions for loc layers,
            Shape: [num_priors,4]
        priors (tensor): Prior boxes in center-offset form.
            Shape: [num_priors,4].
        variances: (list[float]) Variances of priorboxes
    Return:
        decoded bounding box predictions
    """

    boxes = torch.cat((
        priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:],
        priors[:, 2:] * torch.exp(loc[:, 2:] * variances[1])), 1)
    boxes[:, :2] -= boxes[:, 2:] / 2
    boxes[:, 2:] += boxes[:, :2]
return boxes

后處理NMS

這部分我在上周的推文講過原理了敏簿，這里不再贅述了。這里IOU閾值取了0.5宣虾。不了解原理可以去看一下我的那篇推文惯裕，也給了源碼講解，地址是：https://mp.weixin.qq.com/s/orYMdwZ1VwwIScPmIiq5iA 绣硝。這部分的代碼也在layers/box_utils.py里面蜻势。就不再拿代碼來贅述了。

檢測(cè)函數(shù)

模型在測(cè)試的時(shí)候鹉胖，需要把loc和conf輸入到detect函數(shù)進(jìn)行nms握玛，然后給出結(jié)果。這部分的代碼在layers/functions/detection.py里面甫菠，如下：

class Detect(Function):
    """At test time, Detect is the final layer of SSD.  Decode location preds,
    apply non-maximum suppression to location predictions based on conf
    scores and threshold to a top_k number of output predictions for both
    confidence score and locations.
    """
    def __init__(self, num_classes, bkg_label, top_k, conf_thresh, nms_thresh):
        self.num_classes = num_classes
        self.background_label = bkg_label
        self.top_k = top_k
        # Parameters used in nms.
        self.nms_thresh = nms_thresh
        if nms_thresh <= 0:
            raise ValueError('nms_threshold must be non negative.')
        self.conf_thresh = conf_thresh
        self.variance = cfg['variance']

    def forward(self, loc_data, conf_data, prior_data):
        """
        Args:
            loc_data: 預(yù)測(cè)出的loc張量挠铲，shape[b,M,4], eg:[b, 8732, 4]
            conf_data:預(yù)測(cè)出的置信度，shape[b,M,num_classes], eg:[b, 8732, 21]
            prior_data:先驗(yàn)框淑蔚，shape[M,4], eg:[8732, 4]
        """
        num = loc_data.size(0)  # batch size
        num_priors = prior_data.size(0)
        output = torch.zeros(num, self.num_classes, self.top_k, 5)# 初始化輸出
        conf_preds = conf_data.view(num, num_priors,
                                    self.num_classes).transpose(2, 1)

        # 解碼loc的信息市殷，變?yōu)檎５腷boxes
        for i in range(num):
            # 解碼loc
            decoded_boxes = decode(loc_data[i], prior_data, self.variance)
            # 拷貝每個(gè)batch內(nèi)的conf，用于nms
            conf_scores = conf_preds[i].clone()
            # 遍歷每一個(gè)類別
            for cl in range(1, self.num_classes):
                # 篩選掉 conf < conf_thresh 的conf
                c_mask = conf_scores[cl].gt(self.conf_thresh)
                scores = conf_scores[cl][c_mask]
                # 如果都被篩掉了刹衫，則跳入下一類
                if scores.size(0) == 0:
                    continue
                # 篩選掉 conf < conf_thresh 的框
                l_mask = c_mask.unsqueeze(1).expand_as(decoded_boxes)
                boxes = decoded_boxes[l_mask].view(-1, 4)
                # idx of highest scoring and non-overlapping boxes per class
                # nms
                ids, count = nms(boxes, scores, self.nms_thresh, self.top_k)
                # nms 后得到的輸出拼接
                output[i, cl, :count] = \
                    torch.cat((scores[ids[:count]].unsqueeze(1),
                               boxes[ids[:count]]), 1)
        flt = output.contiguous().view(num, -1, 5)
        _, idx = flt[:, :, 0].sort(1, descending=True)
        _, rank = idx.sort(1)
        flt[(rank < self.top_k).unsqueeze(-1).expand_as(flt)].fill_(0)
    return output

后記

SSD的核心代碼解析大概就到這里了醋寝，我覺得這個(gè)過程算法還算比較清晰了搞挣，不過SSD能夠表現(xiàn)較好的原因還和它的多種有效的數(shù)據(jù)增強(qiáng)方式有關(guān)，之后我們有機(jī)會(huì)再來解析一下他的數(shù)據(jù)增強(qiáng)策略音羞。本文寫作的目錄參考了知乎https://zhuanlan.zhihu.com/p/79854543囱桨，看代碼和寫作以及理解一些細(xì)節(jié)大概花了一周時(shí)間，看到這里的同學(xué)不妨給我點(diǎn)個(gè)贊吧嗅绰。

歡迎關(guān)注我的微信公眾號(hào)GiantPadaCV舍肠，期待和你一起交流機(jī)器學(xué)習(xí)，深度學(xué)習(xí)窘面，圖像算法翠语，優(yōu)化技術(shù)，比賽及日常生活等财边。

圖片.png

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末肌括，一起剝皮案震驚了整個(gè)濱河市，隨后出現(xiàn)的幾起案子酣难，更是在濱河造成了極大的恐慌谍夭，老刑警劉巖，帶你破解...
沈念sama閱讀 206,723評(píng)論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件憨募，死亡現(xiàn)場(chǎng)離奇詭異紧索，居然都是意外死亡，警方通過查閱死者的電腦和手機(jī)菜谣，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,485評(píng)論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門珠漂，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人葛菇，你說我怎么就攤上這事甘磨∠鹦撸” “怎么了眯停？”我有些...
開封第一講書人閱讀 152,998評(píng)論 0贊 344
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長(zhǎng)卿泽。經(jīng)常有香客問我莺债，道長(zhǎng)，這世上最難降的妖魔是什么签夭？我笑而不...
開封第一講書人閱讀 55,323評(píng)論 1贊 279
?港島之戀（遺憾婚禮）
正文為了忘掉前任齐邦，我火速辦了婚禮，結(jié)果婚禮上第租，老公的妹妹穿的比我還像新娘措拇。我一直安慰自己，他們只是感情好慎宾，可當(dāng)我...
茶點(diǎn)故事閱讀 64,355評(píng)論 5贊 374
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布丐吓。她就那樣靜靜地躺著浅悉，像睡著了一般。火紅的嫁衣襯著肌膚如雪券犁。梳的紋絲不亂的頭發(fā)上术健，一...
開封第一講書人閱讀 49,079評(píng)論 1贊 285
城市分裂傳說
那天，我揣著相機(jī)與錄音粘衬，去河邊找鬼荞估。笑死，一個(gè)胖子當(dāng)著我的面吹牛稚新，可吹牛的內(nèi)容都是我干的勘伺。我是一名探鬼主播，決...
沈念sama閱讀 38,389評(píng)論 3贊 400
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼褂删，長(zhǎng)吁一口氣：“原來是場(chǎng)噩夢(mèng)啊……” “哼娇昙！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起笤妙，我...
開封第一講書人閱讀 37,019評(píng)論 0贊 259
萬榮殺人案實(shí)錄
序言：老撾萬榮一對(duì)情侶失蹤冒掌，失蹤者是張志新（化名）和其女友劉穎，沒想到半個(gè)月后蹲盘，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體股毫，經(jīng)...
沈念sama閱讀 43,519評(píng)論 1贊 300
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 35,971評(píng)論 2贊 325
?白月光啟示錄
正文我和宋清朗相戀三年召衔，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了铃诬。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點(diǎn)故事閱讀 38,100評(píng)論 1贊 333
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡苍凛，死狀恐怖趣席，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情醇蝴，我是刑警寧澤宣肚，帶...
沈念sama閱讀 33,738評(píng)論 4贊 324
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站悠栓，受9級(jí)特大地震影響霉涨，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜惭适，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,293評(píng)論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一笙瑟、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧癞志，春花似錦往枷、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,289評(píng)論 0贊 19
一樁弒父案错洁，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽茅信。三九已至，卻和暖如春墓臭，著一層夾襖步出監(jiān)牢的瞬間蘸鲸，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 31,517評(píng)論 1贊 262
情欲美人皮
我被黑心中介騙來泰國打工窿锉，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留酌摇，地道東北人。一個(gè)月前我還...
沈念sama閱讀 45,547評(píng)論 2贊 354
代替公主和親
正文我出身青樓嗡载，卻偏偏與公主長(zhǎng)得像窑多，于是被迫代替她去往敵國和親。傳聞我的和親對(duì)象是個(gè)殘疾皇子洼滚，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 42,834評(píng)論 2贊 345

目標(biāo)檢測(cè)算法之SSD代碼解析(萬字長(zhǎng)文超詳細(xì))

前言

網(wǎng)絡(luò)結(jié)構(gòu)

源碼解析

網(wǎng)絡(luò)搭建

Anchor生成(Prior_Box層)

網(wǎng)絡(luò)結(jié)構(gòu)

Loss解析

實(shí)現(xiàn)步驟

先驗(yàn)框匹配策略

位置坐標(biāo)轉(zhuǎn)換

IOU計(jì)算

L2標(biāo)準(zhǔn)化

位置信息編解碼

后處理NMS

檢測(cè)函數(shù)

后記

推薦閱讀更多精彩內(nèi)容