預(yù)備知識(shí):
- 會(huì)使用pytorch搭建簡(jiǎn)單的cnn
- 熟悉神經(jīng)網(wǎng)絡(luò)的訓(xùn)練過(guò)程與優(yōu)化方法
- 結(jié)合理論課的內(nèi)容,了解目標(biāo)檢測(cè)的幾種經(jīng)典算法(如Faster RCNN/YOLO/SSD)的內(nèi)容和原理
聲明:
- 本次實(shí)驗(yàn)課的代碼來(lái)源于github上的一個(gè)開(kāi)源項(xiàng)目,鏈接為:https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Object-Detection#training
- 在該項(xiàng)目的基礎(chǔ)上摄杂,為了便于同學(xué)們更好地去理解代碼嚼沿,我們?cè)诖嘶A(chǔ)上對(duì)代碼做了略微的修改
- 由于目標(biāo)檢測(cè)任務(wù)整個(gè)代碼邏輯比較復(fù)雜绷雏,需要理解的細(xì)節(jié)非常多表牢,因此在本次實(shí)驗(yàn)課內(nèi)容設(shè)計(jì)過(guò)程中我們有幸邀請(qǐng)到了李偉鵬同學(xué),他全程參與了課件的制作過(guò)程汞贸。
網(wǎng)絡(luò)結(jié)構(gòu)
SSD采用VGG16作為基礎(chǔ)模型,然后在VGG16的基礎(chǔ)上新增了卷積層來(lái)獲得更多的特征圖以用于檢測(cè)印机。SSD的網(wǎng)絡(luò)結(jié)構(gòu)如圖所示矢腻。
采用VGG16做基礎(chǔ)模型,分別將VGG16的全連接層fc6和fc7轉(zhuǎn)換成 卷積層 conv6和 卷積層conv7射赛,同時(shí)將池化層pool5由原來(lái)的stride=2的 變成stride=1的 (猜想是不想reduce特征圖大刑けぁ),為了配合這種變化咒劲,采用了一種Atrous Algorithm顷蟆,其實(shí)就是conv6采用擴(kuò)展卷積或帶孔卷積(Dilation Conv)诫隅,然后移除dropout層和fc8層,并新增一系列卷積層帐偎,在檢測(cè)數(shù)據(jù)集上做finetuing逐纬。
其中VGG16中的Conv4_3層將作為用于檢測(cè)的第一個(gè)特征圖。conv4_3層特征圖大小是 削樊,但是該層比較靠前豁生,其norm較大,所以在其后面增加了一個(gè)L2 Normalization層.
先驗(yàn)框
SSD借鑒了Faster R-CNN中anchor的理念漫贞,每個(gè)單元設(shè)置尺度或者長(zhǎng)寬比不同的先驗(yàn)框甸箱,預(yù)測(cè)的邊界框(bounding boxes)是以這些先驗(yàn)框?yàn)榛鶞?zhǔn)的,在一定程度上減少訓(xùn)練難度迅脐。一般情況下芍殖,每個(gè)單元會(huì)設(shè)置多個(gè)先驗(yàn)框,其尺度和長(zhǎng)寬比存在差異谴蔑,如圖所示豌骏,可以看到每個(gè)單元使用了4個(gè)不同的先驗(yàn)框,圖片中貓和狗分別采用最適合它們形狀的先驗(yàn)框來(lái)進(jìn)行訓(xùn)練隐锭。
Dataset
目標(biāo)檢測(cè)任務(wù)的數(shù)據(jù)集的構(gòu)成形式與之前學(xué)習(xí)的分類任務(wù)有很大的區(qū)別窃躲,傳統(tǒng)的分類問(wèn)題的的dataset里面大致包含
[image,label],
由于目標(biāo)檢測(cè)既要做檢測(cè)框的回歸任務(wù)又要做檢測(cè)框內(nèi)物體的分割任務(wù),因此數(shù)據(jù)集的構(gòu)成形式大致如下
[{'boxes':[[ground_truth坐標(biāo)1],[ground_truth坐標(biāo)2]钦睡,...]},{'labels':[ground_truth標(biāo)簽1蒂窒,ground_truth標(biāo)簽2,...]}]
由于數(shù)據(jù)集并沒(méi)有一個(gè)規(guī)整的格式荞怒,處理此類問(wèn)題我們通沉跣澹考慮使用Json文件來(lái)做存儲(chǔ)
首先要將數(shù)據(jù)集提供的txt文件轉(zhuǎn)換成.json文件,方便后面的重寫的dataset函數(shù)load數(shù)據(jù)
create-data_lists()主要的功能就是將圖片和它的ground_truth_box以及box對(duì)應(yīng)的標(biāo)簽連接起來(lái)存到j(luò)son文件中挣输。
注意:此函數(shù)必須運(yùn)行一次纬凤。
from utils import *
create_data_lists(voc07_path='./data1/VOC2007',output_folder='./json1/')
There are 200 training images containing a total of 600 objects. Files have been saved to /home/jovyan/Week8/json1.
There are 200 validation images containing a total of 600 objects. Files have been saved to /home/jovyan/Week8/json1.
import torch
from torch.utils.data import Dataset
import json
import os
from PIL import Image
from utils import transform
class PascalVOCDataset(Dataset):
"""
A PyTorch Dataset class to be used in a PyTorch DataLoader to create batches.
"""
def __init__(self, data_folder, split, keep_difficult=False):
"""
:param data_folder: folder where data files are stored
:param split: split, one of 'TRAIN' or 'TEST'
:param keep_difficult: keep or discard objects that are considered difficult to detect?
"""
self.split = split.upper()
assert self.split in {'TRAIN', 'TEST'}
self.data_folder = data_folder
self.keep_difficult = keep_difficult
# Read data files
with open(os.path.join(data_folder, self.split + '_images.json'), 'r') as j:
self.images = json.load(j)
with open(os.path.join(data_folder, self.split + '_objects.json'), 'r') as j:
self.objects = json.load(j)
assert len(self.images) == len(self.objects)
def __getitem__(self, i):
# Read image
image = Image.open(self.images[i], mode='r')
image = image.convert('RGB')
# Read objects in this image (bounding boxes, labels, difficulties)
objects = self.objects[i]
boxes = torch.FloatTensor(objects['boxes']) # (n_objects, 4)
labels = torch.LongTensor(objects['labels']) # (n_objects)
difficulties = torch.ByteTensor(objects['difficulties']) # (n_objects)
# Discard difficult objects, if desired
if not self.keep_difficult:
boxes = boxes[1 - difficulties]
labels = labels[1 - difficulties]
difficulties = difficulties[1 - difficulties]
# Apply transformations
image, boxes, labels, difficulties = transform(image, boxes, labels, difficulties, split=self.split)
return image, boxes, labels, difficulties
def __len__(self):
return len(self.images)
def collate_fn(self, batch):
"""
Since each image may have a different number of objects, we need a collate function (to be passed to the DataLoader).
This describes how to combine these tensors of different sizes. We use lists.
Note: this need not be defined in this Class, can be standalone.
:param batch: an iterable of N sets from __getitem__()
:return: a tensor of images, lists of varying-size tensors of bounding boxes, labels, and difficulties
"""
images = list()
boxes = list()
labels = list()
difficulties = list()
for b in batch:
images.append(b[0])
boxes.append(b[1])
labels.append(b[2])
difficulties.append(b[3])
images = torch.stack(images, dim=0)
return images, boxes, labels, difficulties # tensor (N, 3, 300, 300), 3 lists of N tensors each
重寫完dataset函數(shù)之后,讓我們看看目標(biāo)檢測(cè)任務(wù)的訓(xùn)練數(shù)據(jù)具體是以何種形式存儲(chǔ)的
data_folder = './json1/'
keep_difficult = True
batch_size = 1
workers = 1
train_dataset = PascalVOCDataset(data_folder,
split='train',
keep_difficult=keep_difficult)
val_dataset = PascalVOCDataset(data_folder,
split='test',
keep_difficult=keep_difficult)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True,
collate_fn=train_dataset.collate_fn, num_workers=workers,
pin_memory=True)
# note that we're passing the collate function here
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=True,
collate_fn=val_dataset.collate_fn, num_workers=workers,
pin_memory=True)
# for data in train_loader:
# images, boxes, labels, difficulties = data
# print('images---->', images)
# print('boxes---->', boxes)
# print('labels---->',labels)
# print('difficulties---->',difficulties)
Loss
ssd的loss分為兩部分,置信度誤差confidence loss和位置location loss. 其中,confidence loss是對(duì)bbox的分類誤差,使用cross entropy loss;而location是bbox的位置與ground truth的回歸誤差,使用smooth l1 loss.
對(duì)于location loss, 公式如下圖, 其中,
,
,
是第j個(gè)groud truth bbox的4個(gè)位置值(中心點(diǎn)x,y坐標(biāo)以及bbox的寬,高).
,
,
,
則是第i個(gè)先驗(yàn)框(prior)的4個(gè)位置值(中心點(diǎn)x,y坐標(biāo)以及bbox的寬,高). 而
,
,
,
是由ground truth bbox j和 先驗(yàn)框(prior) i 算出的transform(或者叫offset)值.
我們的目的是使得我們的CNN網(wǎng)絡(luò)學(xué)習(xí)到這些transform(或者叫offset)值(即讓輸出的loc值逼近它們), 而當(dāng)模型訓(xùn)練好后,進(jìn)行目標(biāo)檢測(cè)時(shí),我們只要將CNN輸出的loc值與先驗(yàn)框(prior)的位置值做一個(gè)decode即可.在decode時(shí), 公式如下,其中對(duì)于第i個(gè)prior,,
,
,
是prior的位置值,
,
,
,
是我們模型輸出的transform/offset值,
,
,
,
是我們檢測(cè)到的物體對(duì)應(yīng)圖片的位置值.
location loss的公式如下,其中,表示CNN對(duì)于每個(gè)先驗(yàn)框輸出的loc值,
表示由ground truth box j與先驗(yàn)框i算出的transform值.
是一個(gè)指示參數(shù),
時(shí)表示先驗(yàn)框i與ground truth box j匹配,且ground truth box j的類別為k. 這里使用smooth l1 loss來(lái)是模型學(xué)習(xí)到的loc值逼近由先驗(yàn)框與ground truth box得到的transform值.其中,Pos表示非背景的先驗(yàn)框的集合(計(jì)算每個(gè)prior與每個(gè)ground truth box的IOU,最大的IOU小于某個(gè)閾值的prior可以視為Negative(背景), 反之視為Positive(非背景)).
對(duì)于confidence loss, 如下圖, 是一個(gè)指示參數(shù),
時(shí)表示先驗(yàn)框i與ground truth box j,且ground truth box j的類別為p(即label).這里直接使用cross entropy loss來(lái)計(jì)算它們的置信度誤差.
表示對(duì)于先驗(yàn)框i模型輸出的(經(jīng)過(guò)softmax)在每個(gè)類上的置信度輸出.其中,Pos表示非背景的先驗(yàn)框的集合,而Neg表示為背景的先驗(yàn)框的集合.
\begin{equation}
L_{conf} = \sum_{i \in Pos}x_{ij}^pCrossEntropy(\vec{c_i}, p) + \sum_{i \in Neg}CrossEntropy(\vec{c_i}, 0)
\end{equation}
在一般情況下,由于在目標(biāo)檢測(cè)中,背景的先驗(yàn)框的數(shù)量會(huì)遠(yuǎn)大于有object的先驗(yàn)框的數(shù)量,為了解決這個(gè)問(wèn)題,在SSD的代碼中使用了hard negative mining.即只選擇negative(視為背景的prior)中選擇loss值較大的項(xiàng).
import torch.nn as nn
class MultiBoxLoss(nn.Module):
"""
The MultiBox loss, a loss function for object detection.
This is a combination of:
(1) a localization loss for the predicted locations of the boxes, and
(2) a confidence loss for the predicted class scores.
"""
def __init__(self, priors_cxcy, threshold=0.5, neg_pos_ratio=3, alpha=1.):
super(MultiBoxLoss, self).__init__()
self.priors_cxcy = priors_cxcy
self.priors_xy = cxcy_to_xy(priors_cxcy)
self.threshold = threshold
self.neg_pos_ratio = neg_pos_ratio
self.alpha = alpha
self.smooth_l1 = nn.SmoothL1Loss()
self.cross_entropy = nn.CrossEntropyLoss(reduce=False)
def forward(self, predicted_locs, predicted_scores, boxes, labels):
"""
Forward propagation.
:param predicted_locs: predicted locations/boxes w.r.t the 8732 prior boxes, a tensor of dimensions (N, 8732, 4)
:param predicted_scores: class scores for each of the encoded locations/boxes, a tensor of dimensions (N, 8732, n_classes)
:param boxes: true object bounding boxes in boundary coordinates, a list of N tensors
:param labels: true object labels, a list of N tensors
:return: multibox loss, a scalar
"""
batch_size = predicted_locs.size(0)
n_priors = self.priors_cxcy.size(0)
n_classes = predicted_scores.size(2)
assert n_priors == predicted_locs.size(1) == predicted_scores.size(1)
true_locs = torch.zeros((batch_size, n_priors, 4), dtype=torch.float).to(device) # (N, 8732, 4)
true_classes = torch.zeros((batch_size, n_priors), dtype=torch.long).to(device) # (N, 8732)
# For each image
for i in range(batch_size):
n_objects = boxes[i].size(0)
overlap = find_jaccard_overlap(boxes[i],
self.priors_xy) # (n_objects, 8732)
# For each prior, find the object that has the maximum overlap
overlap_for_each_prior, object_for_each_prior = overlap.max(dim=0) # (8732)
# We don't want a situation where an object is not represented in our positive (non-background) priors -
# 1. An object might not be the best object for all priors, and is therefore not in object_for_each_prior.
# 2. All priors with the object may be assigned as background based on the threshold (0.5).
# To remedy this -
# First, find the prior that has the maximum overlap for each object.
_, prior_for_each_object = overlap.max(dim=1) # (N_o)
# Then, assign each object to the corresponding maximum-overlap-prior. (This fixes 1.)
object_for_each_prior[prior_for_each_object] = torch.LongTensor(range(n_objects)).to(device)
# To ensure these priors qualify, artificially give them an overlap of greater than 0.5. (This fixes 2.)
overlap_for_each_prior[prior_for_each_object] = 1.
# Labels for each prior
label_for_each_prior = labels[i][object_for_each_prior] # (8732)
# Set priors whose overlaps with objects are less than the threshold to be background (no object)
label_for_each_prior[overlap_for_each_prior < self.threshold] = 0 # (8732)
# Store
true_classes[i] = label_for_each_prior
# Encode center-size object coordinates into the form we regressed predicted boxes to
true_locs[i] = cxcy_to_gcxgcy(xy_to_cxcy(boxes[i][object_for_each_prior]), self.priors_cxcy) # (8732, 4)
# Identify priors that are positive (object/non-background)
positive_priors = true_classes != 0 # (N, 8732)
# LOCALIZATION LOSS
# Localization loss is computed only over positive (non-background) priors
loc_loss = self.smooth_l1(predicted_locs[positive_priors], true_locs[positive_priors]) # (), scalar
# Note: indexing with a torch.uint8 (byte) tensor flattens the tensor when indexing is across multiple dimensions (N & 8732)
# So, if predicted_locs has the shape (N, 8732, 4), predicted_locs[positive_priors] will have (total positives, 4)
# CONFIDENCE LOSS
# Confidence loss is computed over positive priors and the most difficult (hardest) negative priors in each image
# That is, FOR EACH IMAGE,
# we will take the hardest (neg_pos_ratio * n_positives) negative priors, i.e where there is maximum loss
# This is called Hard Negative Mining - it concentrates on hardest negatives in each image, and also minimizes pos/neg imbalance
# Number of positive and hard-negative priors per image
n_positives = positive_priors.sum(dim=1) # (N)
n_hard_negatives = self.neg_pos_ratio * n_positives # (N)
# First, find the loss for all priors
conf_loss_all = self.cross_entropy(predicted_scores.view(-1, n_classes), true_classes.view(-1)) # (N * 8732)
conf_loss_all = conf_loss_all.view(batch_size, n_priors) # (N, 8732)
# We already know which priors are positive
conf_loss_pos = conf_loss_all[positive_priors] # (sum(n_positives))
# Next, find which priors are hard-negative
# To do this, sort ONLY negative priors in each image in order of decreasing loss and take top n_hard_negatives
conf_loss_neg = conf_loss_all.clone() # (N, 8732)
conf_loss_neg[positive_priors] = 0. # (N, 8732), positive priors are ignored (never in top n_hard_negatives)
conf_loss_neg, _ = conf_loss_neg.sort(dim=1, descending=True) # (N, 8732), sorted by decreasing hardness
hardness_ranks = torch.LongTensor(range(n_priors)).unsqueeze(0).expand_as(conf_loss_neg).to(device) # (N, 8732)
hard_negatives = hardness_ranks < n_hard_negatives.unsqueeze(1) # (N, 8732)
conf_loss_hard_neg = conf_loss_neg[hard_negatives] # (sum(n_hard_negatives))
# As in the paper, averaged over positive priors only, although computed over both positive and hard-negative priors
conf_loss = (conf_loss_hard_neg.sum() + conf_loss_pos.sum()) / n_positives.sum().float() # (), scalar
return conf_loss + self.alpha * loc_loss
模型訓(xùn)練
目標(biāo)檢測(cè)模型的訓(xùn)練過(guò)程和分類模型的主要區(qū)別體現(xiàn)在loss函數(shù)輸入的區(qū)別撩嚼,一般的分類模型的loss函數(shù)輸入的是(預(yù)測(cè)結(jié)果停士,標(biāo)簽),而ssd算法的loss函數(shù)輸入的是(預(yù)測(cè)框的數(shù)值完丽,預(yù)測(cè)分類的分?jǐn)?shù)恋技,ground_truth框,分類標(biāo)簽)
注意:以下代碼是簡(jiǎn)化版的train()函數(shù)逻族,省略了對(duì)其他數(shù)據(jù)的一些統(tǒng)計(jì)操作蜻底,主要是為了讓大家理解對(duì)比ssd和分類網(wǎng)絡(luò)訓(xùn)練過(guò)程中的異同點(diǎn)。
請(qǐng)勿運(yùn)行train_model()函數(shù)!!!
def train_model(train_loader, model, criterion, optimizer, epoch):
"""
One epoch's training.
:param train_loader: DataLoader for training data
:param model: model
:param criterion: MultiBox loss
:param optimizer: optimizer
:param epoch: epoch number
"""
model.train() # training mode enables dropout
# Batches
for i, (images, boxes, labels, _) in enumerate(train_loader):
# Move to default device
images = images.to(device) # (batch_size (N), 3, 300, 300)
boxes = [b.to(device) for b in boxes]
labels = [l.to(device) for l in labels]
# Forward prop.
predicted_locs, predicted_scores = model(images) # (N, 8732, 4), (N, 8732, n_classes)
# Loss
loss = criterion(predicted_locs, predicted_scores, boxes, labels) # scalar
# Backward prop.
optimizer.zero_grad()
loss.backward()
# Update model
optimizer.step()
# Print status
if i % print_freq == 0:
print('Loss {loss.val:.4f} ({loss.avg:.4f})\t'.format( loss=losses))
# free some memory since their histories may be stored
del predicted_locs, predicted_scores, images, boxes, labels
作業(yè):
請(qǐng)補(bǔ)充完整訓(xùn)練過(guò)程中缺少的代碼(回憶第三周訓(xùn)練一個(gè)簡(jiǎn)單的分類網(wǎng)絡(luò)的步驟)
補(bǔ)充:
loss函數(shù)缺少的參數(shù)(閱讀上面loss函數(shù)的代碼聘鳞,理解需要計(jì)算loss需要參數(shù))
反向傳播
更新模型的參數(shù)
模型的參數(shù)初始化
import time
import torch.backends.cudnn as cudnn
import torch.optim
import torch.utils.data
from model import SSD300, MultiBoxLoss
from datasets import PascalVOCDataset
from utils import *
from utils1 import *
data_folder = './json1' # folder with data files
keep_difficult = True # use objects considered difficult to detect?
# Model parameters
# Not too many here since the SSD300 has a very specific structure
n_classes = len(label_map) # number of different types of objects
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Learning parameters
checkpoint = "checkpoint_ssd300.pth.tar" # path to model checkpoint, None if none
batch_size = 1 # batch size
start_epoch = 0 # start at this epoch
epochs = 5 # number of epochs to run without early-stopping
epochs_since_improvement = 0 # number of epochs since there was an improvement in the validation metric
best_loss = 100. # assume a high loss at first
workers = 1 # number of workers for loading data in the DataLoader
print_freq = 20 # print training or validation status every __ batches
lr = 1e-3 # learning rate
momentum = 0.9 # momentum
weight_decay = 5e-4 # weight decay
grad_clip = None # clip if gradients are exploding, which may happen at larger batch sizes (sometimes at 32) - you will recognize it by a sorting error in the MuliBox loss calculation
cudnn.benchmark = True
模型的訓(xùn)練以及評(píng)估
此部分是整個(gè)項(xiàng)目的主體結(jié)構(gòu)
def main():
"""
Training and validation.
"""
global epochs_since_improvement, start_epoch, label_map, best_loss, epoch, checkpoint
optimizer, model = init_optimizer_and_model()
# Move to default device
model = model.to(device)
criterion = MultiBoxLoss(priors_cxcy=model.priors_cxcy).to(device)
# Epochs
for epoch in range(start_epoch, epochs):
# Paper describes decaying the learning rate at the 80000th, 100000th, 120000th 'iteration', i.e. model update or batch
# The paper uses a batch size of 32, which means there were about 517 iterations in an epoch
# Therefore, to find the epochs to decay at, you could do,
# if epoch in {80000 // 517, 100000 // 517, 120000 // 517}:
# adjust_learning_rate(optimizer, 0.1)
# In practice, I just decayed the learning rate when loss stopped improving for long periods,
# and I would resume from the last best checkpoint with the new learning rate,
# since there's no point in resuming at the most recent and significantly worse checkpoint.
# So, when you're ready to decay the learning rate, just set checkpoint = 'BEST_checkpoint_ssd300.pth.tar' above
# and have adjust_learning_rate(optimizer, 0.1) BEFORE this 'for' loop
# One epoch's training
train(train_loader=train_loader,
model=model,
criterion=criterion,
optimizer=optimizer,
epoch=epoch)
# One epoch's validation
val_loss = validate(val_loader=val_loader,
model=model,
criterion=criterion)
# Did validation loss improve?
is_best = val_loss < best_loss
best_loss = min(val_loss, best_loss)
if not is_best:
epochs_since_improvement += 1
print("\nEpochs since last improvement: %d\n" % (epochs_since_improvement,))
else:
epochs_since_improvement = 0
# Save checkpoint
save_checkpoint(epoch, epochs_since_improvement, model, optimizer, val_loss, best_loss, is_best)
if __name__ == '__main__':
main()
Loaded base model.
/opt/conda/lib/python3.6/site-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
Epoch: [0][0/200] Batch Time 1.336 (1.336) Data Time 0.145 (0.145) Loss 21.1961 (21.1961)
[0/200] Batch Time 0.127 (0.127) Loss 4.8454 (4.8454)
* LOSS - 13.931
Epoch: [1][0/200] Batch Time 0.188 (0.188) Data Time 0.137 (0.137) Loss 7.1951 (7.1951)
[0/200] Batch Time 0.127 (0.127) Loss 53.9689 (53.9689)
* LOSS - 15.314
?
? Epochs since last improvement: 1
?
? Epoch: [2][0/200] Batch Time 0.201 (0.201) Data Time 0.153 (0.153) Loss 58.9795 (58.9795)
? [0/200] Batch Time 0.134 (0.134) Loss 4.1666 (4.1666)
?
* LOSS - 14.845
?
? Epochs since last improvement: 2
?
? Epoch: [3][0/200] Batch Time 0.185 (0.185) Data Time 0.139 (0.139) Loss 3.7385 (3.7385)
? [0/200] Batch Time 0.123 (0.123) Loss 41.0693 (41.0693)
?
* LOSS - 19.491
?
? Epochs since last improvement: 3
?
? Epoch: [4][0/200] Batch Time 0.192 (0.192) Data Time 0.148 (0.148) Loss 4.8312 (4.8312)
? [0/200] Batch Time 0.123 (0.123) Loss 4.1246 (4.1246)
?
* LOSS - 12.118
目標(biāo)檢測(cè)
將訓(xùn)練好的模型用以檢測(cè)圖片中的物體并分類薄辅,用bounding_box顯示出
修改img_path變量要拂,改變要檢測(cè)的圖片
測(cè)試集可用的圖片從./json1/TEST_images.json
from detect import *
from PIL import Image
from torchvision import transforms
from matplotlib import pyplot as plt
%matplotlib inline
if __name__ == '__main__':
img_path = './data1/VOC2007/JPEGImages/000220.jpg'
original_image = Image.open(img_path, mode='r')
original_image = original_image.convert('RGB')
img = detect(original_image, min_score=0.2, max_overlap=0.1, top_k=200)
plt.imshow(img)
plt.show()
作業(yè):
查看源碼中各個(gè)參數(shù)的具體含義之后,嘗試修改 min_score
, max_overlap
, top_k
三個(gè)參數(shù)值,分析改動(dòng)三個(gè)參數(shù)之后檢測(cè)結(jié)果的變化站楚。
- Each frame is a cluster. Each cluster is trained with a threshold
min_score
. If this threshold is too large, each pixel is classified as a unique cluster, resulting in no frames. If this threshold is too small, each pixel is over-classified, resulting in many frames. - Two frames may have overlap parts, which is determined by
max_overlap
. Note thatmax_overlap
is useful only if you have larger than 2 frames. Ifmax_overlap
is too large, overlap in frames is relatively lenient - two frames may be even same if you setmax_overlap = 1.
. Whenmax_overlap
is small, frames are more independent. - The
top_k
defines the number of frames. If you only have one cluster, a hugetop_k
is of no use. But smallertop_k
can compensate the inaccuracy of classification bymin_score
.
答:
min_score是一個(gè)識(shí)別框被認(rèn)為是一個(gè)類脱惰,然后顯示出來(lái)的最小閾值,如果調(diào)小的話窿春,會(huì)出現(xiàn)很多錯(cuò)誤的識(shí)別框拉一,但是如果調(diào)高的話,則連正確的識(shí)別框都不會(huì)出現(xiàn)旧乞。
max_overlap代表著兩個(gè)識(shí)別框之間可以有的最大重疊蔚润,如果調(diào)小的話,會(huì)發(fā)現(xiàn)圖中有兩個(gè)識(shí)別框尺栖,但其實(shí)其中它們表達(dá)的意思重疊了嫡纠。
top_k代表著顯示識(shí)別框的數(shù)量,由于圖中原本只有一個(gè)識(shí)別框决瞳,只要這個(gè)值不調(diào)為0,則不會(huì)有任何影響左权。
模型評(píng)估
計(jì)算模型分類的準(zhǔn)確率皮胡,由于我們的數(shù)據(jù)集做了刪減,只使用了VOC中的兩類赏迟,因此只有兩類會(huì)有準(zhǔn)確率屡贺,其他類準(zhǔn)確率為0
from eval import *
if __name__ == '__main__':
evaluate(test_loader, model)
{'aeroplane': 0.0,
'bicycle': 0.0,
'bird': 0.0,
'boat': 0.0,
'bottle': 0.0,
'bus': 0.0,
'car': 0.162913516163826,
'cat': 0.0,
'chair': 0.0,
'cow': 0.0,
'diningtable': 0.0,
'dog': 0.0,
'horse': 0.0,
'motorbike': 0.0,
'person': 0.0,
'pottedplant': 0.0,
'sheep': 0.0,
'sofa': 0.0,
'train': 0.0,
'tvmonitor': 0.0}
Mean Average Precision (mAP): 0.008
關(guān)于代碼部分的補(bǔ)充
以上代碼是從源碼中提取出來(lái),并且做了一些必要的修改之后的內(nèi)容锌杀,主要是為了能夠?qū)⒛繕?biāo)檢測(cè)任務(wù)的訓(xùn)練甩栈,評(píng)估,檢測(cè)的過(guò)程以較為清晰的邏輯結(jié)構(gòu)展示給大家糕再。如果同學(xué)們已經(jīng)基本掌握了以上內(nèi)容量没,以下則是源碼以.py文件的正確運(yùn)行方式。
注意:請(qǐng)勿在實(shí)驗(yàn)課上運(yùn)行以下代碼突想,因?yàn)樵创a內(nèi)的參數(shù)是訓(xùn)練這個(gè)模型的默認(rèn)參數(shù)殴蹄,使用這組參數(shù)可以訓(xùn)練出一較為理想的檢測(cè)模型,但是會(huì)占用大量GPU資源猾担,如果同學(xué)們有自己的gpu袭灯,可以課后在自己的設(shè)備上運(yùn)行,課上請(qǐng)勿使用該代碼浪費(fèi)GPU資源绑嘹,謝謝配合稽荧。
import train
# train model
# Setting the parameters you want in train.py file
train.main()
import detect
# detect
# Setting the parameters you want in detect.py file
detect.main()
Loaded checkpoint from epoch 11. Best loss so far is 5.796.
<Figure size 640x480 with 1 Axes>
import eval
# evaluate the model
eval.main()
課后閱讀部分
此部分是提供想要進(jìn)一步了解ssd算法,有興趣做目標(biāo)檢測(cè)任務(wù)的同學(xué)一些在代碼方面更詳細(xì)的解釋工腋,由于目標(biāo)檢測(cè)任務(wù)過(guò)程中會(huì)使用到非常多的數(shù)據(jù)處理姨丈,統(tǒng)計(jì)的工具函數(shù)畅卓,而此部分內(nèi)容又基本都放在了utils.py文件中,因此我們將該文件做了一個(gè)大致的介紹构挤,并且挑選出比較重要的部分為大家詳細(xì)解釋髓介。
utils.py文件解讀
包括的函數(shù)有parse_annotation(), create_data_lists(), decimate(), calculate_mAP(), xy_to_cxcy(), cxcy_to_xy(), cxcy_to_gcxgcy(), gcxgcy_to_cxcy(), find_intersection(), find_jaccard_overlap(), expand(), random_crop(), flip(), resize(), photometric_distort(), transform(),
adjust_learning_rate(), accuracy(), save_checkpoint(), clip_gradient()等。其中有些函數(shù)這里就不詳細(xì)講了希望大家有興趣的可以課下仔細(xì)閱讀下筋现,比如用于圖像數(shù)據(jù)增強(qiáng)的函數(shù)expand(), random_crop(), flip(), resize(), photometric_distort(), transform()唐础,這些函數(shù)不僅可以用于目標(biāo)檢測(cè)還可以用在分類等其他領(lǐng)域。
adjust_learning_rate(), accuracy(), save_checkpoint(), clip_gradient()等函數(shù)則是用于常規(guī)深度學(xué)習(xí)工具函數(shù)矾飞,這里也不再詳細(xì)介紹一膨。
parse_annotation()函數(shù)主要是輔助create_data_lists()這個(gè)函數(shù)完成VOC2007數(shù)據(jù)集XML文件解析的,而create_data_lists()函數(shù)則是解析原始VOC2007數(shù)據(jù)集生成對(duì)應(yīng)實(shí)際訓(xùn)練中載入的Json文件即TRAIN_images.json, TRAIN_objects.json, TEST_images.json, TEST_objects.json, lable_map.json洒沦。
decimate()函數(shù)主要是在進(jìn)行全連接層轉(zhuǎn)化為卷積的時(shí)候進(jìn)行間隔抽樣豹绪,以達(dá)成空洞卷積的目的。
calculate_mAP()函數(shù)是計(jì)算mAP即Mean Average Precision申眼,這一指標(biāo)是近年來(lái)用來(lái)衡量目標(biāo)檢測(cè)算法性能的重要指標(biāo)瞒津,它的核心原理如下:
- 將所有的detection_box按detection_score進(jìn)行排序
- 計(jì)算每個(gè)detection_box與所有g(shù)roundtruth_box的IOU
- 取IOU最大(max_IOU)的groundtruth_box作為這個(gè)detection_box的預(yù)測(cè)結(jié)果是否正確的判斷依據(jù),然后根據(jù)max_IOU的結(jié)果判斷預(yù)測(cè)結(jié)果是TP還是FP進(jìn)而畫出PR曲線括尸,最后再在每一類上做平均得到mAP巷蚪。
- 一個(gè)不錯(cuò)的深入了解鏈接.mAP詳解
- find_intersection()以及find_jaccard_overlap()函數(shù)即為計(jì)算IoU的,這個(gè)在之前分割實(shí)驗(yàn)課上有介紹過(guò)濒翻。其實(shí)現(xiàn)如下:
def find_intersection(set_1, set_2):
"""
Find the intersection of every box combination between two sets of boxes that are in boundary coordinates.
:param set_1: set 1, a tensor of dimensions (n1, 4)
:param set_2: set 2, a tensor of dimensions (n2, 4)
:return: intersection of each of the boxes in set 1 with respect to each of the boxes in set 2, a tensor of dimensions (n1, n2)
"""
# PyTorch auto-broadcasts singleton dimensions
lower_bounds = torch.max(set_1[:, :2].unsqueeze(1), set_2[:, :2].unsqueeze(0)) # (n1, n2, 2)
upper_bounds = torch.min(set_1[:, 2:].unsqueeze(1), set_2[:, 2:].unsqueeze(0)) # (n1, n2, 2)
intersection_dims = torch.clamp(upper_bounds - lower_bounds, min=0) # (n1, n2, 2)
return intersection_dims[:, :, 0] * intersection_dims[:, :, 1] # (n1, n2)
計(jì)算完相交的部分后屁柏,計(jì)算IoU便比較簡(jiǎn)單,只需要用相交部分除以相并的部分
def find_jaccard_overlap(set_1, set_2):
"""
Find the Jaccard Overlap (IoU) of every box combination between two sets of boxes that are in boundary coordinates.
:param set_1: set 1, a tensor of dimensions (n1, 4)
:param set_2: set 2, a tensor of dimensions (n2, 4)
:return: Jaccard Overlap of each of the boxes in set 1 with respect to each of the boxes in set 2, a tensor of dimensions (n1, n2)
"""
# Find intersections
intersection = find_intersection(set_1, set_2) # (n1, n2)
# Find areas of each box in both sets
areas_set_1 = (set_1[:, 2] - set_1[:, 0]) * (set_1[:, 3] - set_1[:, 1]) # (n1)
areas_set_2 = (set_2[:, 2] - set_2[:, 0]) * (set_2[:, 3] - set_2[:, 1]) # (n2)
# Find the union
# PyTorch auto-broadcasts singleton dimensions
union = areas_set_1.unsqueeze(1) + areas_set_2.unsqueeze(0) - intersection # (n1, n2)
return intersection / union # (n1, n2)
非極大值抑制(Non-Maximum Suppression有送,NMS)
NMS是目標(biāo)檢測(cè)的重要算法淌喻,它的作用是用來(lái)去掉模型預(yù)測(cè)后的多余框。如下圖所示:
-
NMS算法處理前
image -
NMS算法處理后
image 算法流程
- 設(shè)定一個(gè)閾值IOU假設(shè)為0.5雀摘,選取每一類box中scores最大的那一個(gè)裸删,記為box_best,并保留它
- 計(jì)算box_best與其余的box的IOU阵赠,如果其IOU>0.5了烁落,那么就舍棄這個(gè)box(由于可能這兩個(gè)box表示同一目標(biāo),所以保留分?jǐn)?shù)高的哪一個(gè))
- 從最后剩余的boxes中豌注,再找出最大scores的哪一個(gè)伤塌,如此循環(huán)往復(fù)
- 一個(gè)簡(jiǎn)單的例子
- 比如現(xiàn)在滑動(dòng)窗口有:A、B轧铁、C每聪、D、E、F药薯、G绑洛、H、I童本、J個(gè)真屯,假設(shè)A是得分最高的,IOU>0.7淘汰穷娱。 第一輪:與A計(jì)算IOU绑蔫,BEG>0.7,剔除泵额,剩余CDFHIJ 第二輪:假設(shè)CDFHIJ中F得分最高配深,與F計(jì)算IOU,DHI>0.7嫁盲,剔除篓叶,剩余CJ 第三輪:假設(shè)CJ中C得分最高,J與C計(jì)算IOU羞秤,若結(jié)果>0.7缸托,則AFC就是選擇出來(lái)的窗口。
- SSD中非極大值抑制的實(shí)現(xiàn)
def NMS(n_classes, predicted_scores, min_score, decoded_locs, max_overlap, image_boxes,
image_labels, image_scores):
for c in range(1, n_classes):
# Keep only predicted boxes and scores where scores for this class are above the minimum score
class_scores = predicted_scores[i][:, c] # (8732)
score_above_min_score = class_scores > min_score # torch.uint8 (byte) tensor, for indexing
n_above_min_score = score_above_min_score.sum().item()
if n_above_min_score == 0:
continue
class_scores = class_scores[score_above_min_score] # (n_qualified), n_min_score <= 8732
class_decoded_locs = decoded_locs[score_above_min_score] # (n_qualified, 4)
# Sort predicted boxes and scores by scores
class_scores, sort_ind = class_scores.sort(dim=0, descending=True) # (n_qualified), (n_min_score)
class_decoded_locs = class_decoded_locs[sort_ind] # (n_min_score, 4)
# Find the overlap between predicted boxes
overlap = find_jaccard_overlap(class_decoded_locs, class_decoded_locs) # (n_qualified, n_min_score)
# Non-Maximum Suppression (NMS)
# A torch.uint8 (byte) tensor to keep track of which predicted boxes to suppress
# 1 implies suppress, 0 implies don't suppress
suppress = torch.zeros((n_above_min_score), dtype=torch.uint8).to(device) # (n_qualified)
# Consider each box in order of decreasing scores
for box in range(class_decoded_locs.size(0)):
# If this box is already marked for suppression
if suppress[box] == 1:
continue
# Suppress boxes whose overlaps (with this box) are greater than maximum overlap
# Find such boxes and update suppress indices
suppress = torch.max(suppress, overlap[box] > max_overlap)
# The max operation retains previously suppressed boxes, like an 'OR' operation
# Don't suppress this box, even though it has an overlap of 1 with itself
suppress[box] = 0
# Store only unsuppressed boxes for this class
image_boxes.append(class_decoded_locs[1 - suppress])
image_labels.append(torch.LongTensor((1 - suppress).sum().item() * [c]).to(device))
image_scores.append(class_scores[1 - suppress])