深度學(xué)習(xí)筆記(十一)—— 目標(biāo)檢測(cè)

預(yù)備知識(shí):

  • 會(huì)使用pytorch搭建簡(jiǎn)單的cnn
  • 熟悉神經(jīng)網(wǎng)絡(luò)的訓(xùn)練過(guò)程與優(yōu)化方法
  • 結(jié)合理論課的內(nèi)容,了解目標(biāo)檢測(cè)的幾種經(jīng)典算法(如Faster RCNN/YOLO/SSD)的內(nèi)容和原理

聲明:

  • 本次實(shí)驗(yàn)課的代碼來(lái)源于github上的一個(gè)開(kāi)源項(xiàng)目,鏈接為:https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Object-Detection#training
  • 在該項(xiàng)目的基礎(chǔ)上摄杂,為了便于同學(xué)們更好地去理解代碼嚼沿,我們?cè)诖嘶A(chǔ)上對(duì)代碼做了略微的修改
  • 由于目標(biāo)檢測(cè)任務(wù)整個(gè)代碼邏輯比較復(fù)雜绷雏,需要理解的細(xì)節(jié)非常多表牢,因此在本次實(shí)驗(yàn)課內(nèi)容設(shè)計(jì)過(guò)程中我們有幸邀請(qǐng)到了李偉鵬同學(xué),他全程參與了課件的制作過(guò)程汞贸。

網(wǎng)絡(luò)結(jié)構(gòu)

SSD采用VGG16作為基礎(chǔ)模型,然后在VGG16的基礎(chǔ)上新增了卷積層來(lái)獲得更多的特征圖以用于檢測(cè)印机。SSD的網(wǎng)絡(luò)結(jié)構(gòu)如圖所示矢腻。

image

采用VGG16做基礎(chǔ)模型,分別將VGG16的全連接層fc6和fc7轉(zhuǎn)換成 卷積層 conv6和 卷積層conv7射赛,同時(shí)將池化層pool5由原來(lái)的stride=2的 變成stride=1的 (猜想是不想reduce特征圖大刑けぁ),為了配合這種變化咒劲,采用了一種Atrous Algorithm顷蟆,其實(shí)就是conv6采用擴(kuò)展卷積或帶孔卷積(Dilation Conv)诫隅,然后移除dropout層和fc8層,并新增一系列卷積層帐偎,在檢測(cè)數(shù)據(jù)集上做finetuing逐纬。
其中VGG16中的Conv4_3層將作為用于檢測(cè)的第一個(gè)特征圖。conv4_3層特征圖大小是 削樊,但是該層比較靠前豁生,其norm較大,所以在其后面增加了一個(gè)L2 Normalization層.

先驗(yàn)框

image

SSD借鑒了Faster R-CNN中anchor的理念漫贞,每個(gè)單元設(shè)置尺度或者長(zhǎng)寬比不同的先驗(yàn)框甸箱,預(yù)測(cè)的邊界框(bounding boxes)是以這些先驗(yàn)框?yàn)榛鶞?zhǔn)的,在一定程度上減少訓(xùn)練難度迅脐。一般情況下芍殖,每個(gè)單元會(huì)設(shè)置多個(gè)先驗(yàn)框,其尺度和長(zhǎng)寬比存在差異谴蔑,如圖所示豌骏,可以看到每個(gè)單元使用了4個(gè)不同的先驗(yàn)框,圖片中貓和狗分別采用最適合它們形狀的先驗(yàn)框來(lái)進(jìn)行訓(xùn)練隐锭。

Dataset

目標(biāo)檢測(cè)任務(wù)的數(shù)據(jù)集的構(gòu)成形式與之前學(xué)習(xí)的分類任務(wù)有很大的區(qū)別窃躲,傳統(tǒng)的分類問(wèn)題的的dataset里面大致包含
[image,label],
由于目標(biāo)檢測(cè)既要做檢測(cè)框的回歸任務(wù)又要做檢測(cè)框內(nèi)物體的分割任務(wù),因此數(shù)據(jù)集的構(gòu)成形式大致如下
[{'boxes':[[ground_truth坐標(biāo)1],[ground_truth坐標(biāo)2]钦睡,...]},{'labels':[ground_truth標(biāo)簽1蒂窒,ground_truth標(biāo)簽2,...]}]
由于數(shù)據(jù)集并沒(méi)有一個(gè)規(guī)整的格式荞怒,處理此類問(wèn)題我們通沉跣澹考慮使用Json文件來(lái)做存儲(chǔ)

首先要將數(shù)據(jù)集提供的txt文件轉(zhuǎn)換成.json文件,方便后面的重寫的dataset函數(shù)load數(shù)據(jù)
create-data_lists()主要的功能就是將圖片和它的ground_truth_box以及box對(duì)應(yīng)的標(biāo)簽連接起來(lái)存到j(luò)son文件中挣输。
注意:此函數(shù)必須運(yùn)行一次纬凤。

from utils import *
create_data_lists(voc07_path='./data1/VOC2007',output_folder='./json1/')
There are 200 training images containing a total of 600 objects. Files have been saved to /home/jovyan/Week8/json1.

There are 200 validation images containing a total of 600 objects. Files have been saved to /home/jovyan/Week8/json1.
import torch
from torch.utils.data import Dataset
import json
import os
from PIL import Image
from utils import transform


class PascalVOCDataset(Dataset):
    """
    A PyTorch Dataset class to be used in a PyTorch DataLoader to create batches.
    """

    def __init__(self, data_folder, split, keep_difficult=False):
        """
        :param data_folder: folder where data files are stored
        :param split: split, one of 'TRAIN' or 'TEST'
        :param keep_difficult: keep or discard objects that are considered difficult to detect?
        """
        self.split = split.upper()

        assert self.split in {'TRAIN', 'TEST'}

        self.data_folder = data_folder
        self.keep_difficult = keep_difficult

        # Read data files
        with open(os.path.join(data_folder, self.split + '_images.json'), 'r') as j:
            self.images = json.load(j)
        with open(os.path.join(data_folder, self.split + '_objects.json'), 'r') as j:
            self.objects = json.load(j)

        assert len(self.images) == len(self.objects)

    def __getitem__(self, i):
        # Read image
        image = Image.open(self.images[i], mode='r')
        image = image.convert('RGB')

        # Read objects in this image (bounding boxes, labels, difficulties)
        objects = self.objects[i]
        boxes = torch.FloatTensor(objects['boxes'])  # (n_objects, 4)
        labels = torch.LongTensor(objects['labels'])  # (n_objects)
        difficulties = torch.ByteTensor(objects['difficulties'])  # (n_objects)

        # Discard difficult objects, if desired
        if not self.keep_difficult:
            boxes = boxes[1 - difficulties]
            labels = labels[1 - difficulties]
            difficulties = difficulties[1 - difficulties]

        # Apply transformations
        image, boxes, labels, difficulties = transform(image, boxes, labels, difficulties, split=self.split)

        return image, boxes, labels, difficulties

    def __len__(self):
        return len(self.images)

    def collate_fn(self, batch):
        """
        Since each image may have a different number of objects, we need a collate function (to be passed to the DataLoader).

        This describes how to combine these tensors of different sizes. We use lists.

        Note: this need not be defined in this Class, can be standalone.

        :param batch: an iterable of N sets from __getitem__()
        :return: a tensor of images, lists of varying-size tensors of bounding boxes, labels, and difficulties
        """

        images = list()
        boxes = list()
        labels = list()
        difficulties = list()

        for b in batch:
            images.append(b[0])
            boxes.append(b[1])
            labels.append(b[2])
            difficulties.append(b[3])

        images = torch.stack(images, dim=0)

        return images, boxes, labels, difficulties  # tensor (N, 3, 300, 300), 3 lists of N tensors each

重寫完dataset函數(shù)之后,讓我們看看目標(biāo)檢測(cè)任務(wù)的訓(xùn)練數(shù)據(jù)具體是以何種形式存儲(chǔ)的

data_folder = './json1/'
keep_difficult = True
batch_size = 1
workers = 1


train_dataset = PascalVOCDataset(data_folder,
                                     split='train',
                                     keep_difficult=keep_difficult)
val_dataset = PascalVOCDataset(data_folder,
                                   split='test',
                                   keep_difficult=keep_difficult)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True,
                                               collate_fn=train_dataset.collate_fn, num_workers=workers,
                                               pin_memory=True)  
                                                # note that we're passing the collate function here
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=True,
                                             collate_fn=val_dataset.collate_fn, num_workers=workers,
                                             pin_memory=True)
    
# for data in train_loader:
#     images, boxes, labels, difficulties = data
#     print('images---->', images)
#     print('boxes---->', boxes)
#     print('labels---->',labels)
#     print('difficulties---->',difficulties)

Loss

ssd的loss分為兩部分,置信度誤差confidence loss和位置location loss. 其中,confidence loss是對(duì)bbox的分類誤差,使用cross entropy loss;而location是bbox的位置與ground truth的回歸誤差,使用smooth l1 loss.

對(duì)于location loss, 公式如下圖, 其中g_j^{cx}, g_j^{cx}, g_j^{w}, g_j^{h} 是第j個(gè)groud truth bbox的4個(gè)位置值(中心點(diǎn)x,y坐標(biāo)以及bbox的寬,高). d_i^{cx}, d_i^{cx}, d_i^{w}, d_i^{h} 則是第i個(gè)先驗(yàn)框(prior)的4個(gè)位置值(中心點(diǎn)x,y坐標(biāo)以及bbox的寬,高). 而 \hat{g}_j^{cx}, \hat{g}_j^{cx}, \hat{g}_j^{w}, \hat{g}_j^{h} 是由ground truth bbox j和 先驗(yàn)框(prior) i 算出的transform(或者叫offset)值.

\hat{g}_{j}^{c x}=\left(g_{j}^{c x}-d_{i}^{c x}\right) / d_{i}^{w}, \hat{g}_{j}^{c y}=\left(g_{j}^{c y}-d_{i}^{c y}\right) / d_{i}^{h}

\hat{g}_{j}^{w}=\log \left(\frac{g_{j}^{w}}{d_{i}^{w}}\right), \hat{g}_{j}^{h}=\log \left(\frac{g_{j}^{h}}{d_{i}^{h}}\right)

我們的目的是使得我們的CNN網(wǎng)絡(luò)學(xué)習(xí)到這些transform(或者叫offset)值(即讓輸出的loc值逼近它們), 而當(dāng)模型訓(xùn)練好后,進(jìn)行目標(biāo)檢測(cè)時(shí),我們只要將CNN輸出的loc值與先驗(yàn)框(prior)的位置值做一個(gè)decode即可.在decode時(shí), 公式如下,其中對(duì)于第i個(gè)prior,d_i^{cx}, d_i^{cx}, d_i^{w}, d_i^{h}是prior的位置值,l_{i}^{cx}, l_{i}^{cy}, l_{i}^{w}, l_{i}^{h}是我們模型輸出的transform/offset值, b_{i}^{cx}, b_{i}^{cy}, b_{i}^{w}, b_{i}^{h}是我們檢測(cè)到的物體對(duì)應(yīng)圖片的位置值.

b_{i}^{w}=d_{i}^{w}\exp{(l_{i}^{w})}, b_{i}^{h}=d_{i}^{h}\exp{(l_{i}^{h})}

b_{i}^{cx}=d_{i}^{w}l^{cx} + d_{i}^{cx}, b_{i}^{cy}= d_{i}^{h}l^{cy} + d_{i}^{cy}

location loss的公式如下,其中,l^{m},m\in\{cx, cy, w, h\}表示CNN對(duì)于每個(gè)先驗(yàn)框輸出的loc值, \hat{g}^{m}表示由ground truth box j與先驗(yàn)框i算出的transform值. x_{ij}^k \in \{0,1\}是一個(gè)指示參數(shù), x_{ij}^k=1時(shí)表示先驗(yàn)框i與ground truth box j匹配,且ground truth box j的類別為k. 這里使用smooth l1 loss來(lái)是模型學(xué)習(xí)到的loc值逼近由先驗(yàn)框與ground truth box得到的transform值.其中,Pos表示非背景的先驗(yàn)框的集合(計(jì)算每個(gè)prior與每個(gè)ground truth box的IOU,最大的IOU小于某個(gè)閾值的prior可以視為Negative(背景), 反之視為Positive(非背景)).

L_{l o c}(x, l, g)=\sum_{i \in P o s}^{N} \quad \sum_{m \in\{c x, c y, w, h\}} x_{i j}^{k} \operatorname{smooth}_{\mathrm{Ll}}\left(l_{i}^{m}-\hat{g}_{j}^{m}\right)

對(duì)于confidence loss, 如下圖, x_{ij}^p \in \{0,1\}是一個(gè)指示參數(shù), x_{ij}^p=1時(shí)表示先驗(yàn)框i與ground truth box j,且ground truth box j的類別為p(即label).這里直接使用cross entropy loss來(lái)計(jì)算它們的置信度誤差. \vec{c_i}表示對(duì)于先驗(yàn)框i模型輸出的(經(jīng)過(guò)softmax)在每個(gè)類上的置信度輸出.其中,Pos表示非背景的先驗(yàn)框的集合,而Neg表示為背景的先驗(yàn)框的集合.
\begin{equation}
L_{conf} = \sum_{i \in Pos}x_{ij}^pCrossEntropy(\vec{c_i}, p) + \sum_{i \in Neg}CrossEntropy(\vec{c_i}, 0)
\end{equation
}

在一般情況下,由于在目標(biāo)檢測(cè)中,背景的先驗(yàn)框的數(shù)量會(huì)遠(yuǎn)大于有object的先驗(yàn)框的數(shù)量,為了解決這個(gè)問(wèn)題,在SSD的代碼中使用了hard negative mining.即只選擇negative(視為背景的prior)中選擇loss值較大的項(xiàng).

import torch.nn as nn
class MultiBoxLoss(nn.Module):
    """
    The MultiBox loss, a loss function for object detection.

    This is a combination of:
    (1) a localization loss for the predicted locations of the boxes, and
    (2) a confidence loss for the predicted class scores.
    """

    def __init__(self, priors_cxcy, threshold=0.5, neg_pos_ratio=3, alpha=1.):
        super(MultiBoxLoss, self).__init__()
        self.priors_cxcy = priors_cxcy
        self.priors_xy = cxcy_to_xy(priors_cxcy)
        self.threshold = threshold
        self.neg_pos_ratio = neg_pos_ratio
        self.alpha = alpha

        self.smooth_l1 = nn.SmoothL1Loss()
        self.cross_entropy = nn.CrossEntropyLoss(reduce=False)

    def forward(self, predicted_locs, predicted_scores, boxes, labels):
        """
        Forward propagation.

        :param predicted_locs: predicted locations/boxes w.r.t the 8732 prior boxes, a tensor of dimensions (N, 8732, 4)
        :param predicted_scores: class scores for each of the encoded locations/boxes, a tensor of dimensions (N, 8732, n_classes)
        :param boxes: true  object bounding boxes in boundary coordinates, a list of N tensors
        :param labels: true object labels, a list of N tensors
        :return: multibox loss, a scalar
        """
        batch_size = predicted_locs.size(0)
        n_priors = self.priors_cxcy.size(0)
        n_classes = predicted_scores.size(2)

        assert n_priors == predicted_locs.size(1) == predicted_scores.size(1)

        true_locs = torch.zeros((batch_size, n_priors, 4), dtype=torch.float).to(device)  # (N, 8732, 4)
        true_classes = torch.zeros((batch_size, n_priors), dtype=torch.long).to(device)  # (N, 8732)

        # For each image
        for i in range(batch_size):
            n_objects = boxes[i].size(0)

            overlap = find_jaccard_overlap(boxes[i],
                                           self.priors_xy)  # (n_objects, 8732)

            # For each prior, find the object that has the maximum overlap
            overlap_for_each_prior, object_for_each_prior = overlap.max(dim=0)  # (8732)

            # We don't want a situation where an object is not represented in our positive (non-background) priors -
            # 1. An object might not be the best object for all priors, and is therefore not in object_for_each_prior.
            # 2. All priors with the object may be assigned as background based on the threshold (0.5).

            # To remedy this -
            # First, find the prior that has the maximum overlap for each object.
            _, prior_for_each_object = overlap.max(dim=1)  # (N_o)

            # Then, assign each object to the corresponding maximum-overlap-prior. (This fixes 1.)
            object_for_each_prior[prior_for_each_object] = torch.LongTensor(range(n_objects)).to(device)

            # To ensure these priors qualify, artificially give them an overlap of greater than 0.5. (This fixes 2.)
            overlap_for_each_prior[prior_for_each_object] = 1.

            # Labels for each prior
            label_for_each_prior = labels[i][object_for_each_prior]  # (8732)
            # Set priors whose overlaps with objects are less than the threshold to be background (no object)
            label_for_each_prior[overlap_for_each_prior < self.threshold] = 0  # (8732)

            # Store
            true_classes[i] = label_for_each_prior

            # Encode center-size object coordinates into the form we regressed predicted boxes to
            true_locs[i] = cxcy_to_gcxgcy(xy_to_cxcy(boxes[i][object_for_each_prior]), self.priors_cxcy)  # (8732, 4)

        # Identify priors that are positive (object/non-background)
        positive_priors = true_classes != 0  # (N, 8732)

        # LOCALIZATION LOSS

        # Localization loss is computed only over positive (non-background) priors
        loc_loss = self.smooth_l1(predicted_locs[positive_priors], true_locs[positive_priors])  # (), scalar

        # Note: indexing with a torch.uint8 (byte) tensor flattens the tensor when indexing is across multiple dimensions (N & 8732)
        # So, if predicted_locs has the shape (N, 8732, 4), predicted_locs[positive_priors] will have (total positives, 4)

        # CONFIDENCE LOSS

        # Confidence loss is computed over positive priors and the most difficult (hardest) negative priors in each image
        # That is, FOR EACH IMAGE,
        # we will take the hardest (neg_pos_ratio * n_positives) negative priors, i.e where there is maximum loss
        # This is called Hard Negative Mining - it concentrates on hardest negatives in each image, and also minimizes pos/neg imbalance

        # Number of positive and hard-negative priors per image
        n_positives = positive_priors.sum(dim=1)  # (N)
        n_hard_negatives = self.neg_pos_ratio * n_positives  # (N)

        # First, find the loss for all priors
        conf_loss_all = self.cross_entropy(predicted_scores.view(-1, n_classes), true_classes.view(-1))  # (N * 8732)
        conf_loss_all = conf_loss_all.view(batch_size, n_priors)  # (N, 8732)

        # We already know which priors are positive
        conf_loss_pos = conf_loss_all[positive_priors]  # (sum(n_positives))

        # Next, find which priors are hard-negative
        # To do this, sort ONLY negative priors in each image in order of decreasing loss and take top n_hard_negatives
        conf_loss_neg = conf_loss_all.clone()  # (N, 8732)
        conf_loss_neg[positive_priors] = 0.  # (N, 8732), positive priors are ignored (never in top n_hard_negatives)
        conf_loss_neg, _ = conf_loss_neg.sort(dim=1, descending=True)  # (N, 8732), sorted by decreasing hardness
        hardness_ranks = torch.LongTensor(range(n_priors)).unsqueeze(0).expand_as(conf_loss_neg).to(device)  # (N, 8732)
        hard_negatives = hardness_ranks < n_hard_negatives.unsqueeze(1)  # (N, 8732)
        conf_loss_hard_neg = conf_loss_neg[hard_negatives]  # (sum(n_hard_negatives))

        # As in the paper, averaged over positive priors only, although computed over both positive and hard-negative priors
        conf_loss = (conf_loss_hard_neg.sum() + conf_loss_pos.sum()) / n_positives.sum().float()  # (), scalar

        
        return conf_loss + self.alpha * loc_loss

模型訓(xùn)練

目標(biāo)檢測(cè)模型的訓(xùn)練過(guò)程和分類模型的主要區(qū)別體現(xiàn)在loss函數(shù)輸入的區(qū)別撩嚼,一般的分類模型的loss函數(shù)輸入的是(預(yù)測(cè)結(jié)果停士,標(biāo)簽),而ssd算法的loss函數(shù)輸入的是(預(yù)測(cè)框的數(shù)值完丽,預(yù)測(cè)分類的分?jǐn)?shù)恋技,ground_truth框,分類標(biāo)簽)
注意:以下代碼是簡(jiǎn)化版的train()函數(shù)逻族,省略了對(duì)其他數(shù)據(jù)的一些統(tǒng)計(jì)操作蜻底,主要是為了讓大家理解對(duì)比ssd和分類網(wǎng)絡(luò)訓(xùn)練過(guò)程中的異同點(diǎn)。

請(qǐng)勿運(yùn)行train_model()函數(shù)!!!

def train_model(train_loader, model, criterion, optimizer, epoch):
    """
    One epoch's training.

    :param train_loader: DataLoader for training data
    :param model: model
    :param criterion: MultiBox loss
    :param optimizer: optimizer
    :param epoch: epoch number
    """
    model.train()  # training mode enables dropout

    # Batches
    for i, (images, boxes, labels, _) in enumerate(train_loader):

        # Move to default device
        images = images.to(device)  # (batch_size (N), 3, 300, 300)
        boxes = [b.to(device) for b in boxes]
        labels = [l.to(device) for l in labels]

        # Forward prop.
        predicted_locs, predicted_scores = model(images)  # (N, 8732, 4), (N, 8732, n_classes)

        # Loss
        loss = criterion(predicted_locs, predicted_scores, boxes, labels)  # scalar

        # Backward prop.
        optimizer.zero_grad()
        loss.backward()

        # Update model
        optimizer.step()

        # Print status
        if i % print_freq == 0:
            print('Loss {loss.val:.4f} ({loss.avg:.4f})\t'.format( loss=losses))
    # free some memory since their histories may be stored
    del predicted_locs, predicted_scores, images, boxes, labels  

作業(yè):

請(qǐng)補(bǔ)充完整訓(xùn)練過(guò)程中缺少的代碼(回憶第三周訓(xùn)練一個(gè)簡(jiǎn)單的分類網(wǎng)絡(luò)的步驟)
補(bǔ)充:
loss函數(shù)缺少的參數(shù)(閱讀上面loss函數(shù)的代碼聘鳞,理解需要計(jì)算loss需要參數(shù))
反向傳播
更新模型的參數(shù)

模型的參數(shù)初始化

import time
import torch.backends.cudnn as cudnn
import torch.optim
import torch.utils.data
from model import SSD300, MultiBoxLoss
from datasets import PascalVOCDataset
from utils import *
from utils1 import *

data_folder = './json1'  # folder with data files
keep_difficult = True  # use objects considered difficult to detect?

# Model parameters
# Not too many here since the SSD300 has a very specific structure
n_classes = len(label_map)  # number of different types of objects
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Learning parameters
checkpoint = "checkpoint_ssd300.pth.tar"  # path to model checkpoint, None if none
batch_size = 1  # batch size
start_epoch = 0  # start at this epoch
epochs = 5  # number of epochs to run without early-stopping
epochs_since_improvement = 0  # number of epochs since there was an improvement in the validation metric
best_loss = 100.  # assume a high loss at first
workers = 1  # number of workers for loading data in the DataLoader
print_freq = 20  # print training or validation status every __ batches
lr = 1e-3  # learning rate
momentum = 0.9  # momentum
weight_decay = 5e-4  # weight decay
grad_clip = None  # clip if gradients are exploding, which may happen at larger batch sizes (sometimes at 32) - you will recognize it by a sorting error in the MuliBox loss calculation

cudnn.benchmark = True

模型的訓(xùn)練以及評(píng)估

此部分是整個(gè)項(xiàng)目的主體結(jié)構(gòu)

def main():
    """
    Training and validation.
    """
    global epochs_since_improvement, start_epoch, label_map, best_loss, epoch, checkpoint
    
    optimizer, model = init_optimizer_and_model()
    
    # Move to default device
    model = model.to(device)
    criterion = MultiBoxLoss(priors_cxcy=model.priors_cxcy).to(device)
    
    # Epochs
    for epoch in range(start_epoch, epochs):
        # Paper describes decaying the learning rate at the 80000th, 100000th, 120000th 'iteration', i.e. model update or batch
        # The paper uses a batch size of 32, which means there were about 517 iterations in an epoch
        # Therefore, to find the epochs to decay at, you could do,
        # if epoch in {80000 // 517, 100000 // 517, 120000 // 517}:
        #     adjust_learning_rate(optimizer, 0.1)

        # In practice, I just decayed the learning rate when loss stopped improving for long periods,
        # and I would resume from the last best checkpoint with the new learning rate,
        # since there's no point in resuming at the most recent and significantly worse checkpoint.
        # So, when you're ready to decay the learning rate, just set checkpoint = 'BEST_checkpoint_ssd300.pth.tar' above
        # and have adjust_learning_rate(optimizer, 0.1) BEFORE this 'for' loop

        # One epoch's training
        train(train_loader=train_loader,
              model=model,
              criterion=criterion,
              optimizer=optimizer,
              epoch=epoch)

        # One epoch's validation
        val_loss = validate(val_loader=val_loader,
                            model=model,
                            criterion=criterion)

        # Did validation loss improve?
        is_best = val_loss < best_loss
        best_loss = min(val_loss, best_loss)

        if not is_best:
            epochs_since_improvement += 1
            print("\nEpochs since last improvement: %d\n" % (epochs_since_improvement,))

        else:
            epochs_since_improvement = 0

        # Save checkpoint
        save_checkpoint(epoch, epochs_since_improvement, model, optimizer, val_loss, best_loss, is_best)
        
if __name__ == '__main__':
    main()
Loaded base model.



/opt/conda/lib/python3.6/site-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
  warnings.warn(warning.format(ret))


Epoch: [0][0/200]   Batch Time 1.336 (1.336)    Data Time 0.145 (0.145) Loss 21.1961 (21.1961)  
[0/200] Batch Time 0.127 (0.127)    Loss 4.8454 (4.8454)    

 * LOSS - 13.931

Epoch: [1][0/200]   Batch Time 0.188 (0.188)    Data Time 0.137 (0.137) Loss 7.1951 (7.1951)    
[0/200] Batch Time 0.127 (0.127)    Loss 53.9689 (53.9689)  

 * LOSS - 15.314

?
? Epochs since last improvement: 1
?
? Epoch: [2][0/200] Batch Time 0.201 (0.201) Data Time 0.153 (0.153) Loss 58.9795 (58.9795)
? [0/200] Batch Time 0.134 (0.134) Loss 4.1666 (4.1666)
?
* LOSS - 14.845

?
? Epochs since last improvement: 2
?
? Epoch: [3][0/200] Batch Time 0.185 (0.185) Data Time 0.139 (0.139) Loss 3.7385 (3.7385)
? [0/200] Batch Time 0.123 (0.123) Loss 41.0693 (41.0693)
?
* LOSS - 19.491

?
? Epochs since last improvement: 3
?
? Epoch: [4][0/200] Batch Time 0.192 (0.192) Data Time 0.148 (0.148) Loss 4.8312 (4.8312)
? [0/200] Batch Time 0.123 (0.123) Loss 4.1246 (4.1246)
?
* LOSS - 12.118

目標(biāo)檢測(cè)

將訓(xùn)練好的模型用以檢測(cè)圖片中的物體并分類薄辅,用bounding_box顯示出
修改img_path變量要拂,改變要檢測(cè)的圖片
測(cè)試集可用的圖片從./json1/TEST_images.json

from detect import *
from PIL import Image
from torchvision import transforms
from matplotlib import pyplot as plt
%matplotlib inline

if __name__ == '__main__':
    img_path = './data1/VOC2007/JPEGImages/000220.jpg'
    original_image = Image.open(img_path, mode='r')
    original_image = original_image.convert('RGB')
    img = detect(original_image, min_score=0.2, max_overlap=0.1, top_k=200)
    plt.imshow(img)
    plt.show()

image

作業(yè):

查看源碼中各個(gè)參數(shù)的具體含義之后,嘗試修改 min_score , max_overlap, top_k 三個(gè)參數(shù)值,分析改動(dòng)三個(gè)參數(shù)之后檢測(cè)結(jié)果的變化站楚。

image
  • Each frame is a cluster. Each cluster is trained with a threshold min_score. If this threshold is too large, each pixel is classified as a unique cluster, resulting in no frames. If this threshold is too small, each pixel is over-classified, resulting in many frames.
  • Two frames may have overlap parts, which is determined by max_overlap. Note that max_overlap is useful only if you have larger than 2 frames. If max_overlap is too large, overlap in frames is relatively lenient - two frames may be even same if you set max_overlap = 1.. When max_overlap is small, frames are more independent.
  • The top_k defines the number of frames. If you only have one cluster, a huge top_k is of no use. But smaller top_k can compensate the inaccuracy of classification by min_score.

答:

  1. min_score是一個(gè)識(shí)別框被認(rèn)為是一個(gè)類脱惰,然后顯示出來(lái)的最小閾值,如果調(diào)小的話窿春,會(huì)出現(xiàn)很多錯(cuò)誤的識(shí)別框拉一,但是如果調(diào)高的話,則連正確的識(shí)別框都不會(huì)出現(xiàn)旧乞。

  2. max_overlap代表著兩個(gè)識(shí)別框之間可以有的最大重疊蔚润,如果調(diào)小的話,會(huì)發(fā)現(xiàn)圖中有兩個(gè)識(shí)別框尺栖,但其實(shí)其中它們表達(dá)的意思重疊了嫡纠。

  3. top_k代表著顯示識(shí)別框的數(shù)量,由于圖中原本只有一個(gè)識(shí)別框决瞳,只要這個(gè)值不調(diào)為0,則不會(huì)有任何影響左权。

模型評(píng)估

計(jì)算模型分類的準(zhǔn)確率皮胡,由于我們的數(shù)據(jù)集做了刪減,只使用了VOC中的兩類赏迟,因此只有兩類會(huì)有準(zhǔn)確率屡贺,其他類準(zhǔn)確率為0

from eval import *
if __name__ == '__main__':
    evaluate(test_loader, model)
{'aeroplane': 0.0,
 'bicycle': 0.0,
 'bird': 0.0,
 'boat': 0.0,
 'bottle': 0.0,
 'bus': 0.0,
 'car': 0.162913516163826,
 'cat': 0.0,
 'chair': 0.0,
 'cow': 0.0,
 'diningtable': 0.0,
 'dog': 0.0,
 'horse': 0.0,
 'motorbike': 0.0,
 'person': 0.0,
 'pottedplant': 0.0,
 'sheep': 0.0,
 'sofa': 0.0,
 'train': 0.0,
 'tvmonitor': 0.0}

Mean Average Precision (mAP): 0.008

關(guān)于代碼部分的補(bǔ)充

以上代碼是從源碼中提取出來(lái),并且做了一些必要的修改之后的內(nèi)容锌杀,主要是為了能夠?qū)⒛繕?biāo)檢測(cè)任務(wù)的訓(xùn)練甩栈,評(píng)估,檢測(cè)的過(guò)程以較為清晰的邏輯結(jié)構(gòu)展示給大家糕再。如果同學(xué)們已經(jīng)基本掌握了以上內(nèi)容量没,以下則是源碼以.py文件的正確運(yùn)行方式。
注意:請(qǐng)勿在實(shí)驗(yàn)課上運(yùn)行以下代碼突想,因?yàn)樵创a內(nèi)的參數(shù)是訓(xùn)練這個(gè)模型的默認(rèn)參數(shù)殴蹄,使用這組參數(shù)可以訓(xùn)練出一較為理想的檢測(cè)模型,但是會(huì)占用大量GPU資源猾担,如果同學(xué)們有自己的gpu袭灯,可以課后在自己的設(shè)備上運(yùn)行,課上請(qǐng)勿使用該代碼浪費(fèi)GPU資源绑嘹,謝謝配合稽荧。

import train

# train model
# Setting the parameters you want in train.py file
train.main()
import detect
# detect
# Setting the parameters you want in detect.py file
detect.main()
Loaded checkpoint from epoch 11. Best loss so far is 5.796.




<Figure size 640x480 with 1 Axes>
import eval

# evaluate the model
eval.main()

課后閱讀部分

此部分是提供想要進(jìn)一步了解ssd算法,有興趣做目標(biāo)檢測(cè)任務(wù)的同學(xué)一些在代碼方面更詳細(xì)的解釋工腋,由于目標(biāo)檢測(cè)任務(wù)過(guò)程中會(huì)使用到非常多的數(shù)據(jù)處理姨丈,統(tǒng)計(jì)的工具函數(shù)畅卓,而此部分內(nèi)容又基本都放在了utils.py文件中,因此我們將該文件做了一個(gè)大致的介紹构挤,并且挑選出比較重要的部分為大家詳細(xì)解釋髓介。

utils.py文件解讀

  • 包括的函數(shù)有parse_annotation(), create_data_lists(), decimate(), calculate_mAP(), xy_to_cxcy(), cxcy_to_xy(), cxcy_to_gcxgcy(), gcxgcy_to_cxcy(), find_intersection(), find_jaccard_overlap(), expand(), random_crop(), flip(), resize(), photometric_distort(), transform(),
    adjust_learning_rate(), accuracy(), save_checkpoint(), clip_gradient()等。

  • 其中有些函數(shù)這里就不詳細(xì)講了希望大家有興趣的可以課下仔細(xì)閱讀下筋现,比如用于圖像數(shù)據(jù)增強(qiáng)的函數(shù)expand(), random_crop(), flip(), resize(), photometric_distort(), transform()唐础,這些函數(shù)不僅可以用于目標(biāo)檢測(cè)還可以用在分類等其他領(lǐng)域。

  • adjust_learning_rate(), accuracy(), save_checkpoint(), clip_gradient()等函數(shù)則是用于常規(guī)深度學(xué)習(xí)工具函數(shù)矾飞,這里也不再詳細(xì)介紹一膨。

  • parse_annotation()函數(shù)主要是輔助create_data_lists()這個(gè)函數(shù)完成VOC2007數(shù)據(jù)集XML文件解析的,而create_data_lists()函數(shù)則是解析原始VOC2007數(shù)據(jù)集生成對(duì)應(yīng)實(shí)際訓(xùn)練中載入的Json文件即TRAIN_images.json, TRAIN_objects.json, TEST_images.json, TEST_objects.json, lable_map.json洒沦。

  • decimate()函數(shù)主要是在進(jìn)行全連接層轉(zhuǎn)化為卷積的時(shí)候進(jìn)行間隔抽樣豹绪,以達(dá)成空洞卷積的目的。

  • calculate_mAP()函數(shù)是計(jì)算mAP即Mean Average Precision申眼,這一指標(biāo)是近年來(lái)用來(lái)衡量目標(biāo)檢測(cè)算法性能的重要指標(biāo)瞒津,它的核心原理如下:

  • 將所有的detection_box按detection_score進(jìn)行排序
  • 計(jì)算每個(gè)detection_box與所有g(shù)roundtruth_box的IOU
  • 取IOU最大(max_IOU)的groundtruth_box作為這個(gè)detection_box的預(yù)測(cè)結(jié)果是否正確的判斷依據(jù),然后根據(jù)max_IOU的結(jié)果判斷預(yù)測(cè)結(jié)果是TP還是FP進(jìn)而畫出PR曲線括尸,最后再在每一類上做平均得到mAP巷蚪。
  • 一個(gè)不錯(cuò)的深入了解鏈接.mAP詳解
  • find_intersection()以及find_jaccard_overlap()函數(shù)即為計(jì)算IoU的,這個(gè)在之前分割實(shí)驗(yàn)課上有介紹過(guò)濒翻。其實(shí)現(xiàn)如下:
def find_intersection(set_1, set_2):
    """
    Find the intersection of every box combination between two sets of boxes that are in boundary coordinates.

    :param set_1: set 1, a tensor of dimensions (n1, 4)
    :param set_2: set 2, a tensor of dimensions (n2, 4)
    :return: intersection of each of the boxes in set 1 with respect to each of the boxes in set 2, a tensor of dimensions (n1, n2)
    """

    # PyTorch auto-broadcasts singleton dimensions
    lower_bounds = torch.max(set_1[:, :2].unsqueeze(1), set_2[:, :2].unsqueeze(0))  # (n1, n2, 2)
    upper_bounds = torch.min(set_1[:, 2:].unsqueeze(1), set_2[:, 2:].unsqueeze(0))  # (n1, n2, 2)
    intersection_dims = torch.clamp(upper_bounds - lower_bounds, min=0)  # (n1, n2, 2)
    return intersection_dims[:, :, 0] * intersection_dims[:, :, 1]  # (n1, n2)

計(jì)算完相交的部分后屁柏,計(jì)算IoU便比較簡(jiǎn)單,只需要用相交部分除以相并的部分

def find_jaccard_overlap(set_1, set_2):
    """
    Find the Jaccard Overlap (IoU) of every box combination between two sets of boxes that are in boundary coordinates.

    :param set_1: set 1, a tensor of dimensions (n1, 4)
    :param set_2: set 2, a tensor of dimensions (n2, 4)
    :return: Jaccard Overlap of each of the boxes in set 1 with respect to each of the boxes in set 2, a tensor of dimensions (n1, n2)
    """

    # Find intersections
    intersection = find_intersection(set_1, set_2)  # (n1, n2)

    # Find areas of each box in both sets
    areas_set_1 = (set_1[:, 2] - set_1[:, 0]) * (set_1[:, 3] - set_1[:, 1])  # (n1)
    areas_set_2 = (set_2[:, 2] - set_2[:, 0]) * (set_2[:, 3] - set_2[:, 1])  # (n2)

    # Find the union
    # PyTorch auto-broadcasts singleton dimensions
    union = areas_set_1.unsqueeze(1) + areas_set_2.unsqueeze(0) - intersection  # (n1, n2)

    return intersection / union  # (n1, n2)

非極大值抑制(Non-Maximum Suppression有送,NMS)

NMS是目標(biāo)檢測(cè)的重要算法淌喻,它的作用是用來(lái)去掉模型預(yù)測(cè)后的多余框。如下圖所示:

  • NMS算法處理前


    image
  • NMS算法處理后


    image
  • 算法流程

  • 設(shè)定一個(gè)閾值IOU假設(shè)為0.5雀摘,選取每一類box中scores最大的那一個(gè)裸删,記為box_best,并保留它
  • 計(jì)算box_best與其余的box的IOU阵赠,如果其IOU>0.5了烁落,那么就舍棄這個(gè)box(由于可能這兩個(gè)box表示同一目標(biāo),所以保留分?jǐn)?shù)高的哪一個(gè))
  • 從最后剩余的boxes中豌注,再找出最大scores的哪一個(gè)伤塌,如此循環(huán)往復(fù)
  • 一個(gè)簡(jiǎn)單的例子
  • 比如現(xiàn)在滑動(dòng)窗口有:A、B轧铁、C每聪、D、E、F药薯、G绑洛、H、I童本、J個(gè)真屯,假設(shè)A是得分最高的,IOU>0.7淘汰穷娱。 第一輪:與A計(jì)算IOU绑蔫,BEG>0.7,剔除泵额,剩余CDFHIJ 第二輪:假設(shè)CDFHIJ中F得分最高配深,與F計(jì)算IOU,DHI>0.7嫁盲,剔除篓叶,剩余CJ 第三輪:假設(shè)CJ中C得分最高,J與C計(jì)算IOU羞秤,若結(jié)果>0.7缸托,則AFC就是選擇出來(lái)的窗口。
  • SSD中非極大值抑制的實(shí)現(xiàn)
def NMS(n_classes, predicted_scores, min_score, decoded_locs, max_overlap, image_boxes,  
        image_labels, image_scores):
    for c in range(1, n_classes):
        # Keep only predicted boxes and scores where scores for this class are above the minimum score
        class_scores = predicted_scores[i][:, c]  # (8732)
        score_above_min_score = class_scores > min_score  # torch.uint8 (byte) tensor, for indexing
        n_above_min_score = score_above_min_score.sum().item()
        if n_above_min_score == 0:
            continue
        class_scores = class_scores[score_above_min_score]  # (n_qualified), n_min_score <= 8732
        class_decoded_locs = decoded_locs[score_above_min_score]  # (n_qualified, 4)

        # Sort predicted boxes and scores by scores
        class_scores, sort_ind = class_scores.sort(dim=0, descending=True)  # (n_qualified), (n_min_score)
        class_decoded_locs = class_decoded_locs[sort_ind]  # (n_min_score, 4)

        # Find the overlap between predicted boxes
        overlap = find_jaccard_overlap(class_decoded_locs, class_decoded_locs)  # (n_qualified, n_min_score)

        # Non-Maximum Suppression (NMS)

        # A torch.uint8 (byte) tensor to keep track of which predicted boxes to suppress
        # 1 implies suppress, 0 implies don't suppress
        suppress = torch.zeros((n_above_min_score), dtype=torch.uint8).to(device)  # (n_qualified)

        # Consider each box in order of decreasing scores
        for box in range(class_decoded_locs.size(0)):
            # If this box is already marked for suppression
            if suppress[box] == 1:
                continue

            # Suppress boxes whose overlaps (with this box) are greater than maximum overlap
            # Find such boxes and update suppress indices
            suppress = torch.max(suppress, overlap[box] > max_overlap)
            # The max operation retains previously suppressed boxes, like an 'OR' operation

            # Don't suppress this box, even though it has an overlap of 1 with itself
            suppress[box] = 0

        # Store only unsuppressed boxes for this class
        image_boxes.append(class_decoded_locs[1 - suppress])
        image_labels.append(torch.LongTensor((1 - suppress).sum().item() * [c]).to(device))
        image_scores.append(class_scores[1 - suppress])
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末瘾蛋,一起剝皮案震驚了整個(gè)濱河市俐镐,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌瘦黑,老刑警劉巖京革,帶你破解...
    沈念sama閱讀 217,277評(píng)論 6 503
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件奇唤,死亡現(xiàn)場(chǎng)離奇詭異幸斥,居然都是意外死亡,警方通過(guò)查閱死者的電腦和手機(jī)咬扇,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,689評(píng)論 3 393
  • 文/潘曉璐 我一進(jìn)店門甲葬,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái),“玉大人懈贺,你說(shuō)我怎么就攤上這事经窖。” “怎么了梭灿?”我有些...
    開(kāi)封第一講書人閱讀 163,624評(píng)論 0 353
  • 文/不壞的土叔 我叫張陵画侣,是天一觀的道長(zhǎng)。 經(jīng)常有香客問(wèn)我堡妒,道長(zhǎng)配乱,這世上最難降的妖魔是什么? 我笑而不...
    開(kāi)封第一講書人閱讀 58,356評(píng)論 1 293
  • 正文 為了忘掉前任,我火速辦了婚禮搬泥,結(jié)果婚禮上桑寨,老公的妹妹穿的比我還像新娘。我一直安慰自己忿檩,他們只是感情好尉尾,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,402評(píng)論 6 392
  • 文/花漫 我一把揭開(kāi)白布。 她就那樣靜靜地躺著燥透,像睡著了一般沙咏。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上兽掰,一...
    開(kāi)封第一講書人閱讀 51,292評(píng)論 1 301
  • 那天芭碍,我揣著相機(jī)與錄音,去河邊找鬼孽尽。 笑死窖壕,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的杉女。 我是一名探鬼主播瞻讽,決...
    沈念sama閱讀 40,135評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼熏挎!你這毒婦竟也來(lái)了速勇?” 一聲冷哼從身側(cè)響起,我...
    開(kāi)封第一講書人閱讀 38,992評(píng)論 0 275
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤坎拐,失蹤者是張志新(化名)和其女友劉穎烦磁,沒(méi)想到半個(gè)月后,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體哼勇,經(jīng)...
    沈念sama閱讀 45,429評(píng)論 1 314
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡都伪,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,636評(píng)論 3 334
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了积担。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片陨晶。...
    茶點(diǎn)故事閱讀 39,785評(píng)論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖帝璧,靈堂內(nèi)的尸體忽然破棺而出先誉,到底是詐尸還是另有隱情,我是刑警寧澤的烁,帶...
    沈念sama閱讀 35,492評(píng)論 5 345
  • 正文 年R本政府宣布褐耳,位于F島的核電站,受9級(jí)特大地震影響渴庆,放射性物質(zhì)發(fā)生泄漏铃芦。R本人自食惡果不足惜买雾,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,092評(píng)論 3 328
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望杨帽。 院中可真熱鬧漓穿,春花似錦、人聲如沸注盈。這莊子的主人今日做“春日...
    開(kāi)封第一講書人閱讀 31,723評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)老客。三九已至僚饭,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間胧砰,已是汗流浹背鳍鸵。 一陣腳步聲響...
    開(kāi)封第一講書人閱讀 32,858評(píng)論 1 269
  • 我被黑心中介騙來(lái)泰國(guó)打工, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留尉间,地道東北人偿乖。 一個(gè)月前我還...
    沈念sama閱讀 47,891評(píng)論 2 370
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像哲嘲,于是被迫代替她去往敵國(guó)和親贪薪。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,713評(píng)論 2 354

推薦閱讀更多精彩內(nèi)容