深度學(xué)習(xí)筆記（十一）—— 目標(biāo)檢測(cè)

預(yù)備知識(shí):

會(huì)使用pytorch搭建簡(jiǎn)單的cnn
熟悉神經(jīng)網(wǎng)絡(luò)的訓(xùn)練過(guò)程與優(yōu)化方法
結(jié)合理論課的內(nèi)容，了解目標(biāo)檢測(cè)的幾種經(jīng)典算法(如Faster RCNN/YOLO/SSD)的內(nèi)容和原理

聲明：

本次實(shí)驗(yàn)課的代碼來(lái)源于github上的一個(gè)開(kāi)源項(xiàng)目，鏈接為:https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Object-Detection#training
在該項(xiàng)目的基礎(chǔ)上摄杂，為了便于同學(xué)們更好地去理解代碼嚼沿，我們?cè)诖嘶A(chǔ)上對(duì)代碼做了略微的修改
由于目標(biāo)檢測(cè)任務(wù)整個(gè)代碼邏輯比較復(fù)雜绷雏，需要理解的細(xì)節(jié)非常多表牢，因此在本次實(shí)驗(yàn)課內(nèi)容設(shè)計(jì)過(guò)程中我們有幸邀請(qǐng)到了李偉鵬同學(xué)，他全程參與了課件的制作過(guò)程汞贸。

網(wǎng)絡(luò)結(jié)構(gòu)

SSD采用VGG16作為基礎(chǔ)模型，然后在VGG16的基礎(chǔ)上新增了卷積層來(lái)獲得更多的特征圖以用于檢測(cè)印机。SSD的網(wǎng)絡(luò)結(jié)構(gòu)如圖所示矢腻。

image

采用VGG16做基礎(chǔ)模型，分別將VGG16的全連接層fc6和fc7轉(zhuǎn)換成卷積層 conv6和卷積層conv7射赛，同時(shí)將池化層pool5由原來(lái)的stride=2的變成stride=1的（猜想是不想reduce特征圖大刑けぁ），為了配合這種變化咒劲，采用了一種Atrous Algorithm顷蟆，其實(shí)就是conv6采用擴(kuò)展卷積或帶孔卷積（Dilation Conv）诫隅，然后移除dropout層和fc8層，并新增一系列卷積層帐偎，在檢測(cè)數(shù)據(jù)集上做finetuing逐纬。
其中VGG16中的Conv4_3層將作為用于檢測(cè)的第一個(gè)特征圖。conv4_3層特征圖大小是削樊，但是該層比較靠前豁生，其norm較大，所以在其后面增加了一個(gè)L2 Normalization層.

先驗(yàn)框

image

SSD借鑒了Faster R-CNN中anchor的理念漫贞，每個(gè)單元設(shè)置尺度或者長(zhǎng)寬比不同的先驗(yàn)框甸箱，預(yù)測(cè)的邊界框（bounding boxes）是以這些先驗(yàn)框?yàn)榛鶞?zhǔn)的，在一定程度上減少訓(xùn)練難度迅脐。一般情況下芍殖，每個(gè)單元會(huì)設(shè)置多個(gè)先驗(yàn)框，其尺度和長(zhǎng)寬比存在差異谴蔑，如圖所示豌骏，可以看到每個(gè)單元使用了4個(gè)不同的先驗(yàn)框，圖片中貓和狗分別采用最適合它們形狀的先驗(yàn)框來(lái)進(jìn)行訓(xùn)練隐锭。

Dataset

目標(biāo)檢測(cè)任務(wù)的數(shù)據(jù)集的構(gòu)成形式與之前學(xué)習(xí)的分類任務(wù)有很大的區(qū)別窃躲，傳統(tǒng)的分類問(wèn)題的的dataset里面大致包含
[image,label],
由于目標(biāo)檢測(cè)既要做檢測(cè)框的回歸任務(wù)又要做檢測(cè)框內(nèi)物體的分割任務(wù)，因此數(shù)據(jù)集的構(gòu)成形式大致如下
[{'boxes':[[ground_truth坐標(biāo)1],[ground_truth坐標(biāo)2]钦睡，...]},{'labels':[ground_truth標(biāo)簽1蒂窒，ground_truth標(biāo)簽2，...]}]
由于數(shù)據(jù)集并沒(méi)有一個(gè)規(guī)整的格式荞怒，處理此類問(wèn)題我們通沉跣澹考慮使用Json文件來(lái)做存儲(chǔ)

首先要將數(shù)據(jù)集提供的txt文件轉(zhuǎn)換成.json文件，方便后面的重寫的dataset函數(shù)load數(shù)據(jù)
create-data_lists()主要的功能就是將圖片和它的ground_truth_box以及box對(duì)應(yīng)的標(biāo)簽連接起來(lái)存到j(luò)son文件中挣输。
注意：此函數(shù)必須運(yùn)行一次纬凤。

from utils import *
create_data_lists(voc07_path='./data1/VOC2007',output_folder='./json1/')

There are 200 training images containing a total of 600 objects. Files have been saved to /home/jovyan/Week8/json1.

There are 200 validation images containing a total of 600 objects. Files have been saved to /home/jovyan/Week8/json1.

import torch
from torch.utils.data import Dataset
import json
import os
from PIL import Image
from utils import transform


class PascalVOCDataset(Dataset):
    """
    A PyTorch Dataset class to be used in a PyTorch DataLoader to create batches.
    """

    def __init__(self, data_folder, split, keep_difficult=False):
        """
        :param data_folder: folder where data files are stored
        :param split: split, one of 'TRAIN' or 'TEST'
        :param keep_difficult: keep or discard objects that are considered difficult to detect?
        """
        self.split = split.upper()

        assert self.split in {'TRAIN', 'TEST'}

        self.data_folder = data_folder
        self.keep_difficult = keep_difficult

        # Read data files
        with open(os.path.join(data_folder, self.split + '_images.json'), 'r') as j:
            self.images = json.load(j)
        with open(os.path.join(data_folder, self.split + '_objects.json'), 'r') as j:
            self.objects = json.load(j)

        assert len(self.images) == len(self.objects)

    def __getitem__(self, i):
        # Read image
        image = Image.open(self.images[i], mode='r')
        image = image.convert('RGB')

        # Read objects in this image (bounding boxes, labels, difficulties)
        objects = self.objects[i]
        boxes = torch.FloatTensor(objects['boxes'])  # (n_objects, 4)
        labels = torch.LongTensor(objects['labels'])  # (n_objects)
        difficulties = torch.ByteTensor(objects['difficulties'])  # (n_objects)

        # Discard difficult objects, if desired
        if not self.keep_difficult:
            boxes = boxes[1 - difficulties]
            labels = labels[1 - difficulties]
            difficulties = difficulties[1 - difficulties]

        # Apply transformations
        image, boxes, labels, difficulties = transform(image, boxes, labels, difficulties, split=self.split)

        return image, boxes, labels, difficulties

    def __len__(self):
        return len(self.images)

    def collate_fn(self, batch):
        """
        Since each image may have a different number of objects, we need a collate function (to be passed to the DataLoader).

        This describes how to combine these tensors of different sizes. We use lists.

        Note: this need not be defined in this Class, can be standalone.

        :param batch: an iterable of N sets from __getitem__()
        :return: a tensor of images, lists of varying-size tensors of bounding boxes, labels, and difficulties
        """

        images = list()
        boxes = list()
        labels = list()
        difficulties = list()

        for b in batch:
            images.append(b[0])
            boxes.append(b[1])
            labels.append(b[2])
            difficulties.append(b[3])

        images = torch.stack(images, dim=0)

        return images, boxes, labels, difficulties  # tensor (N, 3, 300, 300), 3 lists of N tensors each

重寫完dataset函數(shù)之后，讓我們看看目標(biāo)檢測(cè)任務(wù)的訓(xùn)練數(shù)據(jù)具體是以何種形式存儲(chǔ)的

data_folder = './json1/'
keep_difficult = True
batch_size = 1
workers = 1


train_dataset = PascalVOCDataset(data_folder,
                                     split='train',
                                     keep_difficult=keep_difficult)
val_dataset = PascalVOCDataset(data_folder,
                                   split='test',
                                   keep_difficult=keep_difficult)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True,
                                               collate_fn=train_dataset.collate_fn, num_workers=workers,
                                               pin_memory=True)  
                                                # note that we're passing the collate function here
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=True,
                                             collate_fn=val_dataset.collate_fn, num_workers=workers,
                                             pin_memory=True)
    
# for data in train_loader:
#     images, boxes, labels, difficulties = data
#     print('images---->', images)
#     print('boxes---->', boxes)
#     print('labels---->',labels)
#     print('difficulties---->',difficulties)

Loss

ssd的loss分為兩部分,置信度誤差confidence loss和位置location loss. 其中,confidence loss是對(duì)bbox的分類誤差,使用cross entropy loss;而location是bbox的位置與ground truth的回歸誤差,使用smooth l1 loss.

對(duì)于location loss, 公式如下圖, 其中 $g_j^{cx}$ , $g_j^{cx}$ , $g_j^{w}$ , $g_j^{h}$ 是第j個(gè)groud truth bbox的4個(gè)位置值(中心點(diǎn)x,y坐標(biāo)以及bbox的寬,高). $d_i^{cx}$ , $d_i^{cx}$ , $d_i^{w}$ , $d_i^{h}$ 則是第i個(gè)先驗(yàn)框(prior)的4個(gè)位置值(中心點(diǎn)x,y坐標(biāo)以及bbox的寬,高). 而 $\hat{g}_j^{cx}$ , $\hat{g}_j^{cx}$ , $\hat{g}_j^{w}$ , $\hat{g}_j^{h}$ 是由ground truth bbox j和先驗(yàn)框(prior) i 算出的transform(或者叫offset)值.

$\hat{g}_{j}^{c x}=\left(g_{j}^{c x}-d_{i}^{c x}\right) / d_{i}^{w}, \hat{g}_{j}^{c y}=\left(g_{j}^{c y}-d_{i}^{c y}\right) / d_{i}^{h}$

$\hat{g}_{j}^{w}=\log \left(\frac{g_{j}^{w}}{d_{i}^{w}}\right), \hat{g}_{j}^{h}=\log \left(\frac{g_{j}^{h}}{d_{i}^{h}}\right)$

我們的目的是使得我們的CNN網(wǎng)絡(luò)學(xué)習(xí)到這些transform(或者叫offset)值(即讓輸出的loc值逼近它們), 而當(dāng)模型訓(xùn)練好后,進(jìn)行目標(biāo)檢測(cè)時(shí),我們只要將CNN輸出的loc值與先驗(yàn)框(prior)的位置值做一個(gè)decode即可.在decode時(shí), 公式如下,其中對(duì)于第i個(gè)prior, $d_i^{cx}$ , $d_i^{cx}$ , $d_i^{w}$ , $d_i^{h}$ 是prior的位置值, $l_{i}^{cx}$ , $l_{i}^{cy}$ , $l_{i}^{w}$ , $l_{i}^{h}$ 是我們模型輸出的transform/offset值, $b_{i}^{cx}$ , $b_{i}^{cy}$ , $b_{i}^{w}$ , $b_{i}^{h}$ 是我們檢測(cè)到的物體對(duì)應(yīng)圖片的位置值.

$b_{i}^{w}=d_{i}^{w}\exp{(l_{i}^{w})}, b_{i}^{h}=d_{i}^{h}\exp{(l_{i}^{h})}$

$b_{i}^{cx}=d_{i}^{w}l^{cx} + d_{i}^{cx}, b_{i}^{cy}= d_{i}^{h}l^{cy} + d_{i}^{cy}$

location loss的公式如下,其中, $l^{m},m\in\{cx, cy, w, h\}$ 表示CNN對(duì)于每個(gè)先驗(yàn)框輸出的loc值, $\hat{g}^{m}$ 表示由ground truth box j與先驗(yàn)框i算出的transform值. $x_{ij}^k \in \{0,1\}$ 是一個(gè)指示參數(shù), $x_{ij}^k=1$ 時(shí)表示先驗(yàn)框i與ground truth box j匹配,且ground truth box j的類別為k. 這里使用smooth l1 loss來(lái)是模型學(xué)習(xí)到的loc值逼近由先驗(yàn)框與ground truth box得到的transform值.其中,Pos表示非背景的先驗(yàn)框的集合(計(jì)算每個(gè)prior與每個(gè)ground truth box的IOU,最大的IOU小于某個(gè)閾值的prior可以視為Negative(背景), 反之視為Positive(非背景)).

$L_{l o c}(x, l, g)=\sum_{i \in P o s}^{N} \quad \sum_{m \in\{c x, c y, w, h\}} x_{i j}^{k} \operatorname{smooth}_{\mathrm{Ll}}\left(l_{i}^{m}-\hat{g}_{j}^{m}\right)$

對(duì)于confidence loss, 如下圖, $x_{ij}^p \in \{0,1\}$ 是一個(gè)指示參數(shù), $x_{ij}^p=1$ 時(shí)表示先驗(yàn)框i與ground truth box j,且ground truth box j的類別為p(即label).這里直接使用cross entropy loss來(lái)計(jì)算它們的置信度誤差. $\vec{c_i}$ 表示對(duì)于先驗(yàn)框i模型輸出的(經(jīng)過(guò)softmax)在每個(gè)類上的置信度輸出.其中,Pos表示非背景的先驗(yàn)框的集合,而Neg表示為背景的先驗(yàn)框的集合.
\begin{equation}
L_{conf} = \sum_{i \in Pos}x_{ij}^pCrossEntropy(\vec{c_i}, p) + \sum_{i \in Neg}CrossEntropy(\vec{c_i}, 0)
\end{equation}

在一般情況下,由于在目標(biāo)檢測(cè)中,背景的先驗(yàn)框的數(shù)量會(huì)遠(yuǎn)大于有object的先驗(yàn)框的數(shù)量,為了解決這個(gè)問(wèn)題,在SSD的代碼中使用了hard negative mining.即只選擇negative(視為背景的prior)中選擇loss值較大的項(xiàng).

import torch.nn as nn
class MultiBoxLoss(nn.Module):
    """
    The MultiBox loss, a loss function for object detection.

    This is a combination of:
    (1) a localization loss for the predicted locations of the boxes, and
    (2) a confidence loss for the predicted class scores.
    """

    def __init__(self, priors_cxcy, threshold=0.5, neg_pos_ratio=3, alpha=1.):
        super(MultiBoxLoss, self).__init__()
        self.priors_cxcy = priors_cxcy
        self.priors_xy = cxcy_to_xy(priors_cxcy)
        self.threshold = threshold
        self.neg_pos_ratio = neg_pos_ratio
        self.alpha = alpha

        self.smooth_l1 = nn.SmoothL1Loss()
        self.cross_entropy = nn.CrossEntropyLoss(reduce=False)

    def forward(self, predicted_locs, predicted_scores, boxes, labels):
        """
        Forward propagation.

        :param predicted_locs: predicted locations/boxes w.r.t the 8732 prior boxes, a tensor of dimensions (N, 8732, 4)
        :param predicted_scores: class scores for each of the encoded locations/boxes, a tensor of dimensions (N, 8732, n_classes)
        :param boxes: true  object bounding boxes in boundary coordinates, a list of N tensors
        :param labels: true object labels, a list of N tensors
        :return: multibox loss, a scalar
        """
        batch_size = predicted_locs.size(0)
        n_priors = self.priors_cxcy.size(0)
        n_classes = predicted_scores.size(2)

        assert n_priors == predicted_locs.size(1) == predicted_scores.size(1)

        true_locs = torch.zeros((batch_size, n_priors, 4), dtype=torch.float).to(device)  # (N, 8732, 4)
        true_classes = torch.zeros((batch_size, n_priors), dtype=torch.long).to(device)  # (N, 8732)

        # For each image
        for i in range(batch_size):
            n_objects = boxes[i].size(0)

            overlap = find_jaccard_overlap(boxes[i],
                                           self.priors_xy)  # (n_objects, 8732)

            # For each prior, find the object that has the maximum overlap
            overlap_for_each_prior, object_for_each_prior = overlap.max(dim=0)  # (8732)

            # We don't want a situation where an object is not represented in our positive (non-background) priors -
            # 1. An object might not be the best object for all priors, and is therefore not in object_for_each_prior.
            # 2. All priors with the object may be assigned as background based on the threshold (0.5).

            # To remedy this -
            # First, find the prior that has the maximum overlap for each object.
            _, prior_for_each_object = overlap.max(dim=1)  # (N_o)

            # Then, assign each object to the corresponding maximum-overlap-prior. (This fixes 1.)
            object_for_each_prior[prior_for_each_object] = torch.LongTensor(range(n_objects)).to(device)

            # To ensure these priors qualify, artificially give them an overlap of greater than 0.5. (This fixes 2.)
            overlap_for_each_prior[prior_for_each_object] = 1.

            # Labels for each prior
            label_for_each_prior = labels[i][object_for_each_prior]  # (8732)
            # Set priors whose overlaps with objects are less than the threshold to be background (no object)
            label_for_each_prior[overlap_for_each_prior < self.threshold] = 0  # (8732)

            # Store
            true_classes[i] = label_for_each_prior

            # Encode center-size object coordinates into the form we regressed predicted boxes to
            true_locs[i] = cxcy_to_gcxgcy(xy_to_cxcy(boxes[i][object_for_each_prior]), self.priors_cxcy)  # (8732, 4)

        # Identify priors that are positive (object/non-background)
        positive_priors = true_classes != 0  # (N, 8732)

        # LOCALIZATION LOSS

        # Localization loss is computed only over positive (non-background) priors
        loc_loss = self.smooth_l1(predicted_locs[positive_priors], true_locs[positive_priors])  # (), scalar

        # Note: indexing with a torch.uint8 (byte) tensor flattens the tensor when indexing is across multiple dimensions (N & 8732)
        # So, if predicted_locs has the shape (N, 8732, 4), predicted_locs[positive_priors] will have (total positives, 4)

        # CONFIDENCE LOSS

        # Confidence loss is computed over positive priors and the most difficult (hardest) negative priors in each image
        # That is, FOR EACH IMAGE,
        # we will take the hardest (neg_pos_ratio * n_positives) negative priors, i.e where there is maximum loss
        # This is called Hard Negative Mining - it concentrates on hardest negatives in each image, and also minimizes pos/neg imbalance

        # Number of positive and hard-negative priors per image
        n_positives = positive_priors.sum(dim=1)  # (N)
        n_hard_negatives = self.neg_pos_ratio * n_positives  # (N)

        # First, find the loss for all priors
        conf_loss_all = self.cross_entropy(predicted_scores.view(-1, n_classes), true_classes.view(-1))  # (N * 8732)
        conf_loss_all = conf_loss_all.view(batch_size, n_priors)  # (N, 8732)

        # We already know which priors are positive
        conf_loss_pos = conf_loss_all[positive_priors]  # (sum(n_positives))

        # Next, find which priors are hard-negative
        # To do this, sort ONLY negative priors in each image in order of decreasing loss and take top n_hard_negatives
        conf_loss_neg = conf_loss_all.clone()  # (N, 8732)
        conf_loss_neg[positive_priors] = 0.  # (N, 8732), positive priors are ignored (never in top n_hard_negatives)
        conf_loss_neg, _ = conf_loss_neg.sort(dim=1, descending=True)  # (N, 8732), sorted by decreasing hardness
        hardness_ranks = torch.LongTensor(range(n_priors)).unsqueeze(0).expand_as(conf_loss_neg).to(device)  # (N, 8732)
        hard_negatives = hardness_ranks < n_hard_negatives.unsqueeze(1)  # (N, 8732)
        conf_loss_hard_neg = conf_loss_neg[hard_negatives]  # (sum(n_hard_negatives))

        # As in the paper, averaged over positive priors only, although computed over both positive and hard-negative priors
        conf_loss = (conf_loss_hard_neg.sum() + conf_loss_pos.sum()) / n_positives.sum().float()  # (), scalar

        
        return conf_loss + self.alpha * loc_loss

模型訓(xùn)練

目標(biāo)檢測(cè)模型的訓(xùn)練過(guò)程和分類模型的主要區(qū)別體現(xiàn)在loss函數(shù)輸入的區(qū)別撩嚼，一般的分類模型的loss函數(shù)輸入的是（預(yù)測(cè)結(jié)果停士，標(biāo)簽），而ssd算法的loss函數(shù)輸入的是（預(yù)測(cè)框的數(shù)值完丽，預(yù)測(cè)分類的分?jǐn)?shù)恋技，ground_truth框，分類標(biāo)簽）
注意:以下代碼是簡(jiǎn)化版的train()函數(shù)逻族，省略了對(duì)其他數(shù)據(jù)的一些統(tǒng)計(jì)操作蜻底，主要是為了讓大家理解對(duì)比ssd和分類網(wǎng)絡(luò)訓(xùn)練過(guò)程中的異同點(diǎn)。

請(qǐng)勿運(yùn)行train_model()函數(shù)!!!

def train_model(train_loader, model, criterion, optimizer, epoch):
    """
    One epoch's training.

    :param train_loader: DataLoader for training data
    :param model: model
    :param criterion: MultiBox loss
    :param optimizer: optimizer
    :param epoch: epoch number
    """
    model.train()  # training mode enables dropout

    # Batches
    for i, (images, boxes, labels, _) in enumerate(train_loader):

        # Move to default device
        images = images.to(device)  # (batch_size (N), 3, 300, 300)
        boxes = [b.to(device) for b in boxes]
        labels = [l.to(device) for l in labels]

        # Forward prop.
        predicted_locs, predicted_scores = model(images)  # (N, 8732, 4), (N, 8732, n_classes)

        # Loss
        loss = criterion(predicted_locs, predicted_scores, boxes, labels)  # scalar

        # Backward prop.
        optimizer.zero_grad()
        loss.backward()

        # Update model
        optimizer.step()

        # Print status
        if i % print_freq == 0:
            print('Loss {loss.val:.4f} ({loss.avg:.4f})\t'.format( loss=losses))
    # free some memory since their histories may be stored
    del predicted_locs, predicted_scores, images, boxes, labels

作業(yè)：

請(qǐng)補(bǔ)充完整訓(xùn)練過(guò)程中缺少的代碼(回憶第三周訓(xùn)練一個(gè)簡(jiǎn)單的分類網(wǎng)絡(luò)的步驟)
補(bǔ)充：
loss函數(shù)缺少的參數(shù)(閱讀上面loss函數(shù)的代碼聘鳞，理解需要計(jì)算loss需要參數(shù))
反向傳播
更新模型的參數(shù)

模型的參數(shù)初始化

import time
import torch.backends.cudnn as cudnn
import torch.optim
import torch.utils.data
from model import SSD300, MultiBoxLoss
from datasets import PascalVOCDataset
from utils import *
from utils1 import *

data_folder = './json1'  # folder with data files
keep_difficult = True  # use objects considered difficult to detect?

# Model parameters
# Not too many here since the SSD300 has a very specific structure
n_classes = len(label_map)  # number of different types of objects
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Learning parameters
checkpoint = "checkpoint_ssd300.pth.tar"  # path to model checkpoint, None if none
batch_size = 1  # batch size
start_epoch = 0  # start at this epoch
epochs = 5  # number of epochs to run without early-stopping
epochs_since_improvement = 0  # number of epochs since there was an improvement in the validation metric
best_loss = 100.  # assume a high loss at first
workers = 1  # number of workers for loading data in the DataLoader
print_freq = 20  # print training or validation status every __ batches
lr = 1e-3  # learning rate
momentum = 0.9  # momentum
weight_decay = 5e-4  # weight decay
grad_clip = None  # clip if gradients are exploding, which may happen at larger batch sizes (sometimes at 32) - you will recognize it by a sorting error in the MuliBox loss calculation

cudnn.benchmark = True

模型的訓(xùn)練以及評(píng)估

此部分是整個(gè)項(xiàng)目的主體結(jié)構(gòu)

def main():
    """
    Training and validation.
    """
    global epochs_since_improvement, start_epoch, label_map, best_loss, epoch, checkpoint
    
    optimizer, model = init_optimizer_and_model()
    
    # Move to default device
    model = model.to(device)
    criterion = MultiBoxLoss(priors_cxcy=model.priors_cxcy).to(device)
    
    # Epochs
    for epoch in range(start_epoch, epochs):
        # Paper describes decaying the learning rate at the 80000th, 100000th, 120000th 'iteration', i.e. model update or batch
        # The paper uses a batch size of 32, which means there were about 517 iterations in an epoch
        # Therefore, to find the epochs to decay at, you could do,
        # if epoch in {80000 // 517, 100000 // 517, 120000 // 517}:
        #     adjust_learning_rate(optimizer, 0.1)

        # In practice, I just decayed the learning rate when loss stopped improving for long periods,
        # and I would resume from the last best checkpoint with the new learning rate,
        # since there's no point in resuming at the most recent and significantly worse checkpoint.
        # So, when you're ready to decay the learning rate, just set checkpoint = 'BEST_checkpoint_ssd300.pth.tar' above
        # and have adjust_learning_rate(optimizer, 0.1) BEFORE this 'for' loop

        # One epoch's training
        train(train_loader=train_loader,
              model=model,
              criterion=criterion,
              optimizer=optimizer,
              epoch=epoch)

        # One epoch's validation
        val_loss = validate(val_loader=val_loader,
                            model=model,
                            criterion=criterion)

        # Did validation loss improve?
        is_best = val_loss < best_loss
        best_loss = min(val_loss, best_loss)

        if not is_best:
            epochs_since_improvement += 1
            print("\nEpochs since last improvement: %d\n" % (epochs_since_improvement,))

        else:
            epochs_since_improvement = 0

        # Save checkpoint
        save_checkpoint(epoch, epochs_since_improvement, model, optimizer, val_loss, best_loss, is_best)
        
if __name__ == '__main__':
    main()

Loaded base model.



/opt/conda/lib/python3.6/site-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
  warnings.warn(warning.format(ret))


Epoch: [0][0/200]   Batch Time 1.336 (1.336)    Data Time 0.145 (0.145) Loss 21.1961 (21.1961)  
[0/200] Batch Time 0.127 (0.127)    Loss 4.8454 (4.8454)    

 * LOSS - 13.931

Epoch: [1][0/200]   Batch Time 0.188 (0.188)    Data Time 0.137 (0.137) Loss 7.1951 (7.1951)    
[0/200] Batch Time 0.127 (0.127)    Loss 53.9689 (53.9689)  

 * LOSS - 15.314

?
? Epochs since last improvement: 1
?
? Epoch: [2][0/200] Batch Time 0.201 (0.201) Data Time 0.153 (0.153) Loss 58.9795 (58.9795)
? [0/200] Batch Time 0.134 (0.134) Loss 4.1666 (4.1666)
?
* LOSS - 14.845

?
? Epochs since last improvement: 2
?
? Epoch: [3][0/200] Batch Time 0.185 (0.185) Data Time 0.139 (0.139) Loss 3.7385 (3.7385)
? [0/200] Batch Time 0.123 (0.123) Loss 41.0693 (41.0693)
?
* LOSS - 19.491

?
? Epochs since last improvement: 3
?
? Epoch: [4][0/200] Batch Time 0.192 (0.192) Data Time 0.148 (0.148) Loss 4.8312 (4.8312)
? [0/200] Batch Time 0.123 (0.123) Loss 4.1246 (4.1246)
?
* LOSS - 12.118

目標(biāo)檢測(cè)

將訓(xùn)練好的模型用以檢測(cè)圖片中的物體并分類薄辅，用bounding_box顯示出
修改img_path變量要拂，改變要檢測(cè)的圖片
測(cè)試集可用的圖片從./json1/TEST_images.json

from detect import *
from PIL import Image
from torchvision import transforms
from matplotlib import pyplot as plt
%matplotlib inline

if __name__ == '__main__':
    img_path = './data1/VOC2007/JPEGImages/000220.jpg'
    original_image = Image.open(img_path, mode='r')
    original_image = original_image.convert('RGB')
    img = detect(original_image, min_score=0.2, max_overlap=0.1, top_k=200)
    plt.imshow(img)
    plt.show()

image

作業(yè)：

查看源碼中各個(gè)參數(shù)的具體含義之后，嘗試修改 min_score , max_overlap, top_k 三個(gè)參數(shù)值,分析改動(dòng)三個(gè)參數(shù)之后檢測(cè)結(jié)果的變化站楚。

image

Each frame is a cluster. Each cluster is trained with a threshold min_score. If this threshold is too large, each pixel is classified as a unique cluster, resulting in no frames. If this threshold is too small, each pixel is over-classified, resulting in many frames.
Two frames may have overlap parts, which is determined by max_overlap. Note that max_overlap is useful only if you have larger than 2 frames. If max_overlap is too large, overlap in frames is relatively lenient - two frames may be even same if you set max_overlap = 1.. When max_overlap is small, frames are more independent.
The top_k defines the number of frames. If you only have one cluster, a huge top_k is of no use. But smaller top_k can compensate the inaccuracy of classification by min_score.

答：

min_score是一個(gè)識(shí)別框被認(rèn)為是一個(gè)類脱惰，然后顯示出來(lái)的最小閾值，如果調(diào)小的話窿春，會(huì)出現(xiàn)很多錯(cuò)誤的識(shí)別框拉一，但是如果調(diào)高的話，則連正確的識(shí)別框都不會(huì)出現(xiàn)旧乞。
max_overlap代表著兩個(gè)識(shí)別框之間可以有的最大重疊蔚润，如果調(diào)小的話，會(huì)發(fā)現(xiàn)圖中有兩個(gè)識(shí)別框尺栖，但其實(shí)其中它們表達(dá)的意思重疊了嫡纠。
top_k代表著顯示識(shí)別框的數(shù)量，由于圖中原本只有一個(gè)識(shí)別框决瞳，只要這個(gè)值不調(diào)為0，則不會(huì)有任何影響左权。

模型評(píng)估

計(jì)算模型分類的準(zhǔn)確率皮胡，由于我們的數(shù)據(jù)集做了刪減，只使用了VOC中的兩類赏迟，因此只有兩類會(huì)有準(zhǔn)確率屡贺，其他類準(zhǔn)確率為0

from eval import *
if __name__ == '__main__':
    evaluate(test_loader, model)

{'aeroplane': 0.0,
 'bicycle': 0.0,
 'bird': 0.0,
 'boat': 0.0,
 'bottle': 0.0,
 'bus': 0.0,
 'car': 0.162913516163826,
 'cat': 0.0,
 'chair': 0.0,
 'cow': 0.0,
 'diningtable': 0.0,
 'dog': 0.0,
 'horse': 0.0,
 'motorbike': 0.0,
 'person': 0.0,
 'pottedplant': 0.0,
 'sheep': 0.0,
 'sofa': 0.0,
 'train': 0.0,
 'tvmonitor': 0.0}

Mean Average Precision (mAP): 0.008

關(guān)于代碼部分的補(bǔ)充

以上代碼是從源碼中提取出來(lái)，并且做了一些必要的修改之后的內(nèi)容锌杀，主要是為了能夠?qū)⒛繕?biāo)檢測(cè)任務(wù)的訓(xùn)練甩栈，評(píng)估，檢測(cè)的過(guò)程以較為清晰的邏輯結(jié)構(gòu)展示給大家糕再。如果同學(xué)們已經(jīng)基本掌握了以上內(nèi)容量没，以下則是源碼以.py文件的正確運(yùn)行方式。
注意:請(qǐng)勿在實(shí)驗(yàn)課上運(yùn)行以下代碼突想，因?yàn)樵创a內(nèi)的參數(shù)是訓(xùn)練這個(gè)模型的默認(rèn)參數(shù)殴蹄，使用這組參數(shù)可以訓(xùn)練出一較為理想的檢測(cè)模型，但是會(huì)占用大量GPU資源猾担，如果同學(xué)們有自己的gpu袭灯，可以課后在自己的設(shè)備上運(yùn)行，課上請(qǐng)勿使用該代碼浪費(fèi)GPU資源绑嘹，謝謝配合稽荧。

import train

# train model
# Setting the parameters you want in train.py file
train.main()

import detect
# detect
# Setting the parameters you want in detect.py file
detect.main()

Loaded checkpoint from epoch 11. Best loss so far is 5.796.




<Figure size 640x480 with 1 Axes>

import eval

# evaluate the model
eval.main()

課后閱讀部分

此部分是提供想要進(jìn)一步了解ssd算法，有興趣做目標(biāo)檢測(cè)任務(wù)的同學(xué)一些在代碼方面更詳細(xì)的解釋工腋，由于目標(biāo)檢測(cè)任務(wù)過(guò)程中會(huì)使用到非常多的數(shù)據(jù)處理姨丈，統(tǒng)計(jì)的工具函數(shù)畅卓，而此部分內(nèi)容又基本都放在了utils.py文件中，因此我們將該文件做了一個(gè)大致的介紹构挤，并且挑選出比較重要的部分為大家詳細(xì)解釋髓介。

utils.py文件解讀

包括的函數(shù)有parse_annotation(), create_data_lists(), decimate(), calculate_mAP(), xy_to_cxcy(), cxcy_to_xy(), cxcy_to_gcxgcy(), gcxgcy_to_cxcy(), find_intersection(), find_jaccard_overlap(), expand(), random_crop(), flip(), resize(), photometric_distort(), transform(),
adjust_learning_rate(), accuracy(), save_checkpoint(), clip_gradient()等。
其中有些函數(shù)這里就不詳細(xì)講了希望大家有興趣的可以課下仔細(xì)閱讀下筋现，比如用于圖像數(shù)據(jù)增強(qiáng)的函數(shù)expand(), random_crop(), flip(), resize(), photometric_distort(), transform()唐础，這些函數(shù)不僅可以用于目標(biāo)檢測(cè)還可以用在分類等其他領(lǐng)域。
adjust_learning_rate(), accuracy(), save_checkpoint(), clip_gradient()等函數(shù)則是用于常規(guī)深度學(xué)習(xí)工具函數(shù)矾飞，這里也不再詳細(xì)介紹一膨。
parse_annotation()函數(shù)主要是輔助create_data_lists()這個(gè)函數(shù)完成VOC2007數(shù)據(jù)集XML文件解析的，而create_data_lists()函數(shù)則是解析原始VOC2007數(shù)據(jù)集生成對(duì)應(yīng)實(shí)際訓(xùn)練中載入的Json文件即TRAIN_images.json, TRAIN_objects.json, TEST_images.json, TEST_objects.json, lable_map.json洒沦。
decimate()函數(shù)主要是在進(jìn)行全連接層轉(zhuǎn)化為卷積的時(shí)候進(jìn)行間隔抽樣豹绪，以達(dá)成空洞卷積的目的。
calculate_mAP()函數(shù)是計(jì)算mAP即Mean Average Precision申眼，這一指標(biāo)是近年來(lái)用來(lái)衡量目標(biāo)檢測(cè)算法性能的重要指標(biāo)瞒津，它的核心原理如下：

將所有的detection_box按detection_score進(jìn)行排序

計(jì)算每個(gè)detection_box與所有g(shù)roundtruth_box的IOU

取IOU最大(max_IOU)的groundtruth_box作為這個(gè)detection_box的預(yù)測(cè)結(jié)果是否正確的判斷依據(jù)，然后根據(jù)max_IOU的結(jié)果判斷預(yù)測(cè)結(jié)果是TP還是FP進(jìn)而畫出PR曲線括尸，最后再在每一類上做平均得到mAP巷蚪。

一個(gè)不錯(cuò)的深入了解鏈接.mAP詳解

find_intersection()以及find_jaccard_overlap()函數(shù)即為計(jì)算IoU的，這個(gè)在之前分割實(shí)驗(yàn)課上有介紹過(guò)濒翻。其實(shí)現(xiàn)如下：

def find_intersection(set_1, set_2):
    """
    Find the intersection of every box combination between two sets of boxes that are in boundary coordinates.

    :param set_1: set 1, a tensor of dimensions (n1, 4)
    :param set_2: set 2, a tensor of dimensions (n2, 4)
    :return: intersection of each of the boxes in set 1 with respect to each of the boxes in set 2, a tensor of dimensions (n1, n2)
    """

    # PyTorch auto-broadcasts singleton dimensions
    lower_bounds = torch.max(set_1[:, :2].unsqueeze(1), set_2[:, :2].unsqueeze(0))  # (n1, n2, 2)
    upper_bounds = torch.min(set_1[:, 2:].unsqueeze(1), set_2[:, 2:].unsqueeze(0))  # (n1, n2, 2)
    intersection_dims = torch.clamp(upper_bounds - lower_bounds, min=0)  # (n1, n2, 2)
    return intersection_dims[:, :, 0] * intersection_dims[:, :, 1]  # (n1, n2)

計(jì)算完相交的部分后屁柏，計(jì)算IoU便比較簡(jiǎn)單，只需要用相交部分除以相并的部分

def find_jaccard_overlap(set_1, set_2):
    """
    Find the Jaccard Overlap (IoU) of every box combination between two sets of boxes that are in boundary coordinates.

    :param set_1: set 1, a tensor of dimensions (n1, 4)
    :param set_2: set 2, a tensor of dimensions (n2, 4)
    :return: Jaccard Overlap of each of the boxes in set 1 with respect to each of the boxes in set 2, a tensor of dimensions (n1, n2)
    """

    # Find intersections
    intersection = find_intersection(set_1, set_2)  # (n1, n2)

    # Find areas of each box in both sets
    areas_set_1 = (set_1[:, 2] - set_1[:, 0]) * (set_1[:, 3] - set_1[:, 1])  # (n1)
    areas_set_2 = (set_2[:, 2] - set_2[:, 0]) * (set_2[:, 3] - set_2[:, 1])  # (n2)

    # Find the union
    # PyTorch auto-broadcasts singleton dimensions
    union = areas_set_1.unsqueeze(1) + areas_set_2.unsqueeze(0) - intersection  # (n1, n2)

    return intersection / union  # (n1, n2)

非極大值抑制（Non-Maximum Suppression有送，NMS）

NMS是目標(biāo)檢測(cè)的重要算法淌喻，它的作用是用來(lái)去掉模型預(yù)測(cè)后的多余框。如下圖所示:

NMS算法處理前

image
NMS算法處理后

image
算法流程

設(shè)定一個(gè)閾值IOU假設(shè)為0.5雀摘，選取每一類box中scores最大的那一個(gè)裸删，記為box_best，并保留它

計(jì)算box_best與其余的box的IOU阵赠，如果其IOU>0.5了烁落，那么就舍棄這個(gè)box（由于可能這兩個(gè)box表示同一目標(biāo)，所以保留分?jǐn)?shù)高的哪一個(gè)）

從最后剩余的boxes中豌注，再找出最大scores的哪一個(gè)伤塌，如此循環(huán)往復(fù)

一個(gè)簡(jiǎn)單的例子

比如現(xiàn)在滑動(dòng)窗口有：A、B轧铁、C每聪、D、E、F药薯、G绑洛、H、I童本、J個(gè)真屯，假設(shè)A是得分最高的，IOU＞0.7淘汰穷娱。第一輪：與A計(jì)算IOU绑蔫，BEG＞0.7，剔除泵额，剩余CDFHIJ 第二輪：假設(shè)CDFHIJ中F得分最高配深，與F計(jì)算IOU，DHI＞0.7嫁盲，剔除篓叶，剩余CJ 第三輪：假設(shè)CJ中C得分最高，J與C計(jì)算IOU羞秤，若結(jié)果＞0.7缸托，則AFC就是選擇出來(lái)的窗口。

SSD中非極大值抑制的實(shí)現(xiàn)

def NMS(n_classes, predicted_scores, min_score, decoded_locs, max_overlap, image_boxes,  
        image_labels, image_scores):
    for c in range(1, n_classes):
        # Keep only predicted boxes and scores where scores for this class are above the minimum score
        class_scores = predicted_scores[i][:, c]  # (8732)
        score_above_min_score = class_scores > min_score  # torch.uint8 (byte) tensor, for indexing
        n_above_min_score = score_above_min_score.sum().item()
        if n_above_min_score == 0:
            continue
        class_scores = class_scores[score_above_min_score]  # (n_qualified), n_min_score <= 8732
        class_decoded_locs = decoded_locs[score_above_min_score]  # (n_qualified, 4)

        # Sort predicted boxes and scores by scores
        class_scores, sort_ind = class_scores.sort(dim=0, descending=True)  # (n_qualified), (n_min_score)
        class_decoded_locs = class_decoded_locs[sort_ind]  # (n_min_score, 4)

        # Find the overlap between predicted boxes
        overlap = find_jaccard_overlap(class_decoded_locs, class_decoded_locs)  # (n_qualified, n_min_score)

        # Non-Maximum Suppression (NMS)

        # A torch.uint8 (byte) tensor to keep track of which predicted boxes to suppress
        # 1 implies suppress, 0 implies don't suppress
        suppress = torch.zeros((n_above_min_score), dtype=torch.uint8).to(device)  # (n_qualified)

        # Consider each box in order of decreasing scores
        for box in range(class_decoded_locs.size(0)):
            # If this box is already marked for suppression
            if suppress[box] == 1:
                continue

            # Suppress boxes whose overlaps (with this box) are greater than maximum overlap
            # Find such boxes and update suppress indices
            suppress = torch.max(suppress, overlap[box] > max_overlap)
            # The max operation retains previously suppressed boxes, like an 'OR' operation

            # Don't suppress this box, even though it has an overlap of 1 with itself
            suppress[box] = 0

        # Store only unsuppressed boxes for this class
        image_boxes.append(class_decoded_locs[1 - suppress])
        image_labels.append(torch.LongTensor((1 - suppress).sum().item() * [c]).to(device))
        image_scores.append(class_scores[1 - suppress])

最后編輯于：2019.08.02 09:08:38

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末瘾蛋，一起剝皮案震驚了整個(gè)濱河市俐镐，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌瘦黑，老刑警劉巖京革，帶你破解...
沈念sama閱讀 217,277評(píng)論 6贊 503
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件奇唤，死亡現(xiàn)場(chǎng)離奇詭異幸斥，居然都是意外死亡，警方通過(guò)查閱死者的電腦和手機(jī)咬扇，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,689評(píng)論 3贊 393
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門甲葬，熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)，“玉大人懈贺，你說(shuō)我怎么就攤上這事经窖。” “怎么了梭灿？”我有些...
開(kāi)封第一講書人閱讀 163,624評(píng)論 0贊 353
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵画侣，是天一觀的道長(zhǎng)。經(jīng)常有香客問(wèn)我堡妒，道長(zhǎng)配乱，這世上最難降的妖魔是什么？我笑而不...
開(kāi)封第一講書人閱讀 58,356評(píng)論 1贊 293
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮搬泥，結(jié)果婚禮上桑寨，老公的妹妹穿的比我還像新娘。我一直安慰自己忿檩，他們只是感情好尉尾，可當(dāng)我...
茶點(diǎn)故事閱讀 67,402評(píng)論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來(lái)的
文/花漫我一把揭開(kāi)白布。她就那樣靜靜地躺著燥透，像睡著了一般沙咏。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上兽掰，一...
開(kāi)封第一講書人閱讀 51,292評(píng)論 1贊 301
城市分裂傳說(shuō)
那天芭碍，我揣著相機(jī)與錄音，去河邊找鬼孽尽。笑死窖壕，一個(gè)胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的杉女。我是一名探鬼主播瞻讽，決...
沈念sama閱讀 40,135評(píng)論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開(kāi)眼，長(zhǎng)吁一口氣：“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼熏挎！你這毒婦竟也來(lái)了速勇？” 一聲冷哼從身側(cè)響起，我...
開(kāi)封第一講書人閱讀 38,992評(píng)論 0贊 275
萬(wàn)榮殺人案實(shí)錄
序言：老撾萬(wàn)榮一對(duì)情侶失蹤坎拐，失蹤者是張志新（化名）和其女友劉穎烦磁，沒(méi)想到半個(gè)月后，有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體哼勇，經(jīng)...
沈念sama閱讀 45,429評(píng)論 1贊 314
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡都伪，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,636評(píng)論 3贊 334
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了积担。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片陨晶。...
茶點(diǎn)故事閱讀 39,785評(píng)論 1贊 348
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡，死狀恐怖帝璧，靈堂內(nèi)的尸體忽然破棺而出先誉，到底是詐尸還是另有隱情，我是刑警寧澤的烁，帶...
沈念sama閱讀 35,492評(píng)論 5贊 345
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布褐耳，位于F島的核電站，受9級(jí)特大地震影響渴庆，放射性物質(zhì)發(fā)生泄漏铃芦。R本人自食惡果不足惜买雾，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,092評(píng)論 3贊 328
男人毒藥：我在死后第九天來(lái)索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望杨帽。院中可真熱鬧漓穿，春花似錦、人聲如沸注盈。這莊子的主人今日做“春日...
開(kāi)封第一講書人閱讀 31,723評(píng)論 0贊 22
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)老客。三九已至僚饭，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間胧砰，已是汗流浹背鳍鸵。一陣腳步聲響...
開(kāi)封第一講書人閱讀 32,858評(píng)論 1贊 269
情欲美人皮
我被黑心中介騙來(lái)泰國(guó)打工，沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留尉间，地道東北人偿乖。一個(gè)月前我還...
沈念sama閱讀 47,891評(píng)論 2贊 370
代替公主和親
正文我出身青樓，卻偏偏與公主長(zhǎng)得像哲嘲，于是被迫代替她去往敵國(guó)和親贪薪。傳聞我的和親對(duì)象是個(gè)殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 44,713評(píng)論 2贊 354

深度學(xué)習(xí)筆記（十一）—— 目標(biāo)檢測(cè)

預(yù)備知識(shí):

聲明：

網(wǎng)絡(luò)結(jié)構(gòu)

先驗(yàn)框

Dataset

Loss

模型訓(xùn)練

作業(yè)：

模型的參數(shù)初始化

模型的訓(xùn)練以及評(píng)估

目標(biāo)檢測(cè)

作業(yè)：

答：

模型評(píng)估

關(guān)于代碼部分的補(bǔ)充

課后閱讀部分

utils.py文件解讀

非極大值抑制（Non-Maximum Suppression有送，NMS）

推薦閱讀更多精彩內(nèi)容