github代碼地址:https://github.com/BobLiu20/YOLOv3_PyTorch
參考博文:yolo系列之yolo v3
主要參考這個博客以及代碼進行學習谒府。
YOLO_V1和YOLO_V2不進行描述陈哑,請看上面的博文中的鏈接。
網絡結構的思想相同,核心有:backbone遍蟋,BN,leaky RELU狱窘,Logistic regression,多尺度淮韭,端到端,“分而治之”采幌。
Faster R-CNN系列先通過region-proposal進行候選區(qū)域提取劲够,再對候選區(qū)域進行調整。
YOLO系列假設每個區(qū)域塊都可能有東西休傍,對每個塊進行物品檢測征绎,類似于Faster R-CNN的后端網絡。
yolo_v3網絡結構
上圖是博文中大佬的心血磨取,如要引用請獲得該作者同意人柿。
大佬推薦的模型可視化工具:Netron
網絡模塊說明:
DBL
:如圖一左下角所示柴墩,DBL就是conv + BN + leaky RELU。resn
:n代表數(shù)字凫岖,有res1,...,res8等江咳,表示res_block中有多少個res_unit.該模塊的具體結構會在后文介紹。res的結構數(shù)量是1,2,8,8,4隘截。concat
:張量拼接扎阶,將兩個矩陣進行拼接,就是矩陣的連接婶芭,僅改變channel东臀,不改變batch、weight犀农、height惰赋。
1.barknet53
v3中沒有池化層和全連接層,tensor的改變依靠卷積核的步長呵哨。步長2表示寬高各雖小為原來的一半赁濒。darknet中有五次縮小,最終是將原圖縮小為輸入的1/32孟害。
2.output
yolov3輸出三個不同尺度的feature map拒炎,如圖三y1,y2,y3所示。這就是論文中的predictions across scales挨务。
y1,y2,y3的channel都是255击你,對于coco集合,有80個類別谎柄。每一個box對應一個概率丁侄,每個單元格預測三個box,每個box需要(x, y, w, h, confidence)五個基本參數(shù)朝巫,所以有3×(5 + 80) = 255個參數(shù)鸿摇。Yolov3采用相對位置預測,預測出b-box中心點相對于該box左上角的相對坐標劈猿。confidence表示是前景的概率拙吉。
3.loss function
v3中使用的logistic regression
對box中的內容進行目標評分,也就是一個目標可以有多個屬性糙臼,也能使人庐镐,也可能是女人。
接下來到了美麗的代碼環(huán)節(jié)
1.網絡模型
網絡訓練文件:training/training.py
# 按配置加載網絡
net = ModelMain(config, is_training=is_training)
# 設定進行網絡訓練
net.train(is_training)
網絡模型文件:nets/model_main.py
- 1变逃、DBL模塊
# 輸入輸出特征圖大小不變的特征矩陣必逆,僅改變C,步長都是1,所以就是再增加非線性能力的卷積層
# _in 輸入特征圖維數(shù)
# _out 輸出特征圖維數(shù)
# ks 卷積核大小
def _make_cbl(self, _in, _out, ks):
''' cbl = conv + batch_norm + leaky_relu
'''
pad = (ks - 1) // 2 if ks else 0
return nn.Sequential(OrderedDict([
("conv", nn.Conv2d(_in, _out, kernel_size=ks, stride=1, padding=pad, bias=False)),
("bn", nn.BatchNorm2d(_out)),
("relu", nn.LeakyReLU(0.1)),
]))
- 2、DBL×5模塊
# 將in_filters維度的矩陣名眉,使用filters_list維度進行改變粟矿,輸出out_filter維度的矩陣
def _make_embedding(self, filters_list, in_filters, out_filter):
m = nn.ModuleList([
self._make_cbl(in_filters, filters_list[0], 1),
self._make_cbl(filters_list[0], filters_list[1], 3),
self._make_cbl(filters_list[1], filters_list[0], 1),
self._make_cbl(filters_list[0], filters_list[1], 3),
self._make_cbl(filters_list[1], filters_list[0], 1),
self._make_cbl(filters_list[0], filters_list[1], 3)])
m.add_module("conv_out", nn.Conv2d(filters_list[1], out_filter, kernel_size=1,
stride=1, padding=0, bias=True))
return m
- 3、DBL×5模塊
def forward(self, x):
# 一系列的DBL损拢,out_branch中間愛呢產生的用于與上一層的特征concat
# 五個DBL進行非線性陌粹,一個DBL +conv生成最終結果
def _branch(_embedding, _in):
for i, e in enumerate(_embedding):
_in = e(_in)
if i == 4:
out_branch = _in
return _in, out_branch
# backbone
x2, x1, x0 = self.backbone(x)
# 一系列的DBL,
out0, out0_branch = _branch(self.embedding0, x0)
# yolo branch 1
x1_in = self.embedding1_cbl(out0_branch)
x1_in = self.embedding1_upsample(x1_in)
x1_in = torch.cat([x1_in, x1], 1)
out1, out1_branch = _branch(self.embedding1, x1_in)
# yolo branch 2
x2_in = self.embedding2_cbl(out1_branch)
x2_in = self.embedding2_upsample(x2_in)
x2_in = torch.cat([x2_in, x2], 1)
out2, out2_branch = _branch(self.embedding2, x2_in)
return out0, out1, out2
- 4福压、網絡初始化
def __init__(self, config, is_training=True):
super(ModelMain, self).__init__()
self.config = config
self.training = is_training
self.model_params = config["model_params"]
# darknet_53模塊
_backbone_fn = backbone_fn[self.model_params["backbone_name"]]
self.backbone = _backbone_fn(self.model_params["backbone_pretrained"])
_out_filters = self.backbone.layers_out_filters
# 對應y1的DBL*5掏秩、DBL+conv ,輸出:y1和相應分支
final_out_filter0 = len(config["yolo"]["anchors"][0]) * (5 + config["yolo"]["classes"])
self.embedding0 = self._make_embedding([512, 1024], _out_filters[-1], final_out_filter0)
# 對應y2的DBL荆姆、UpSample蒙幻、DBL*5、DBL+conv 胆筒,輸出:y2和相應分支
final_out_filter1 = len(config["yolo"]["anchors"][1]) * (5 + config["yolo"]["classes"])
self.embedding1_cbl = self._make_cbl(512, 256, 1)
self.embedding1_upsample = nn.Upsample(scale_factor=2, mode='nearest')
self.embedding1 = self._make_embedding([256, 512], _out_filters[-2] + 256, final_out_filter1)
# 對應y3的DBL邮破、UpSample、DBL*5仆救、DBL+conv 抒和,輸出:y3和相應分支
final_out_filter2 = len(config["yolo"]["anchors"][2]) * (5 + config["yolo"]["classes"])
self.embedding2_cbl = self._make_cbl(256, 128, 1)
self.embedding2_upsample = nn.Upsample(scale_factor=2, mode='nearest')
self.embedding2 = self._make_embedding([128, 256], _out_filters[-3] + 128, final_out_filter2)
- 4、網絡初始化
def forward(self, x):
# 一系列的DBL彤蔽,out_branch中間產生的用于與上一層的特征concat
# 五個DBL進行非線性摧莽,一個DBL +conv生成最終結果
def _branch(_embedding, _in):
for i, e in enumerate(_embedding):
_in = e(_in)
if i == 4:
out_branch = _in
return _in, out_branch
# backbone
# 生成虛線框中的三個輸出:x0是最右邊的輸出,x1是向下右邊的輸出。,x2向下的左邊的顿痪。
x2, x1, x0 = self.backbone(x)
# 輸出y1對應的DBL×5,DBL+conv
out0, out0_branch = _branch(self.embedding0, x0)
# 輸出y2對應的DBL+upsample范嘱,再通過cat進行拼接,最后使用DBL×5,DBL+conv獲取y2
x1_in = self.embedding1_cbl(out0_branch)
x1_in = self.embedding1_upsample(x1_in)
x1_in = torch.cat([x1_in, x1], 1)
out1, out1_branch = _branch(self.embedding1, x1_in)
# 輸出y3對應的DBL+upsample员魏,再通過cat進行拼接,最后使用DBL×5,DBL+conv獲取y3
x2_in = self.embedding2_cbl(out1_branch)
x2_in = self.embedding2_upsample(x2_in)
x2_in = torch.cat([x2_in, x2], 1)
out2, out2_branch = _branch(self.embedding2, x2_in)
return out0, out1, out2
- 5叠聋、獲取真實標簽
def get_target(self, target, anchors, in_w, in_h, ignore_threshold):
bs = target.size(0)
# 生成每個變量的矩陣都是bs,50,in_h,in_w
mask = torch.zeros(bs, self.num_anchors, in_h, in_w, requires_grad=False)
noobj_mask = torch.ones(bs, self.num_anchors, in_h, in_w, requires_grad=False)
tx = torch.zeros(bs, self.num_anchors, in_h, in_w, requires_grad=False)
ty = torch.zeros(bs, self.num_anchors, in_h, in_w, requires_grad=False)
tw = torch.zeros(bs, self.num_anchors, in_h, in_w, requires_grad=False)
th = torch.zeros(bs, self.num_anchors, in_h, in_w, requires_grad=False)
tconf = torch.zeros(bs, self.num_anchors, in_h, in_w, requires_grad=False)
tcls = torch.zeros(bs, self.num_anchors, in_h, in_w, self.num_classes, requires_grad=False)
for b in range(bs):
for t in range(target.shape[1]):
if target[b, t].sum() == 0:
continue
# 計算應該圖像的那個塊中
gx = target[b, t, 1] * in_w
gy = target[b, t, 2] * in_h
gw = target[b, t, 3] * in_w
gh = target[b, t, 4] * in_h
# 得到對應grid的序號
gi = int(gx)
gj = int(gy)
# 得到真實box的寬高
gt_box = torch.FloatTensor(np.array([0, 0, gw, gh])).unsqueeze(0)
# 得到anchor的寬高
anchor_shapes = torch.FloatTensor(np.concatenate((np.zeros((self.num_anchors, 2)),
np.array(anchors)), 1))
# 計算真實box與anchor的box的IOU
anch_ious = bbox_iou(gt_box, anchor_shapes)
# 重疊比例大于閾值則設置為0,忽略撕阎。
noobj_mask[b, anch_ious > ignore_threshold, gj, gi] = 0
# 找到anchor匹配率最佳的
best_n = np.argmax(anch_ious)
# 給予匹配率最佳的mask為1
mask[b, best_n, gj, gi] = 1
# box的x,y偏移
tx[b, best_n, gj, gi] = gx - gi
ty[b, best_n, gj, gi] = gy - gj
# box的寬碌补、高偏移
tw[b, best_n, gj, gi] = math.log(gw/anchors[best_n][0] + 1e-16)
th[b, best_n, gj, gi] = math.log(gh/anchors[best_n][1] + 1e-16)
# 設置
tconf[b, best_n, gj, gi] = 1
# 生成on-hot編碼的類型值
tcls[b, best_n, gj, gi, int(target[b, t, 0])] = 1
return mask, noobj_mask, tx, ty, tw, th, tconf, tcls
- 6虏束、loss計算
def __init__(self, anchors, num_classes, img_size):
super(YOLOLoss, self).__init__()
self.anchors = anchors
self.num_anchors = len(anchors)
self.num_classes = num_classes
self.bbox_attrs = 5 + num_classes
self.img_size = img_size
self.ignore_threshold = 0.5
self.lambda_xy = 2.5
self.lambda_wh = 2.5
self.lambda_conf = 1.0
self.lambda_cls = 1.0
self.mse_loss = nn.MSELoss()
self.bce_loss = nn.BCELoss()
def forward(self, input, targets=None):
bs = input.size(0)
in_h = input.size(2)
in_w = input.size(3)
stride_h = self.img_size[1] / in_h
stride_w = self.img_size[0] / in_w
scaled_anchors = [(a_w / stride_w, a_h / stride_h) for a_w, a_h in self.anchors]
prediction = input.view(bs, self.num_anchors,
self.bbox_attrs, in_h, in_w).permute(0, 1, 3, 4, 2).contiguous()
# Get outputs
x = torch.sigmoid(prediction[..., 0]) # Center x
y = torch.sigmoid(prediction[..., 1]) # Center y
w = prediction[..., 2] # Width
h = prediction[..., 3] # Height
conf = torch.sigmoid(prediction[..., 4]) # Conf
pred_cls = torch.sigmoid(prediction[..., 5:]) # Cls pred.
if targets is not None:
# build target
mask, noobj_mask, tx, ty, tw, th, tconf, tcls = self.get_target(targets, scaled_anchors,
in_w, in_h,
self.ignore_threshold)
mask, noobj_mask = mask.cuda(), noobj_mask.cuda()
tx, ty, tw, th = tx.cuda(), ty.cuda(), tw.cuda(), th.cuda()
tconf, tcls = tconf.cuda(), tcls.cuda()
# 每一種loss.
loss_x = self.bce_loss(x * mask, tx * mask)
loss_y = self.bce_loss(y * mask, ty * mask)
loss_w = self.mse_loss(w * mask, tw * mask)
loss_h = self.mse_loss(h * mask, th * mask)
loss_conf = self.bce_loss(conf * mask, mask) + \
0.5 * self.bce_loss(conf * noobj_mask, noobj_mask * 0.0)
loss_cls = self.bce_loss(pred_cls[mask == 1], tcls[mask == 1])
# total loss = losses * weight
loss = loss_x * self.lambda_xy + loss_y * self.lambda_xy + \
loss_w * self.lambda_wh + loss_h * self.lambda_wh + \
loss_conf * self.lambda_conf + loss_cls * self.lambda_cls
return loss, loss_x.item(), loss_y.item(), loss_w.item(),\
loss_h.item(), loss_conf.item(), loss_cls.item()
else:
FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
LongTensor = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor
# Calculate offsets for each grid
grid_x = torch.linspace(0, in_w-1, in_w).repeat(in_w, 1).repeat(
bs * self.num_anchors, 1, 1).view(x.shape).type(FloatTensor)
grid_y = torch.linspace(0, in_h-1, in_h).repeat(in_h, 1).t().repeat(
bs * self.num_anchors, 1, 1).view(y.shape).type(FloatTensor)
# Calculate anchor w, h
anchor_w = FloatTensor(scaled_anchors).index_select(1, LongTensor([0]))
anchor_h = FloatTensor(scaled_anchors).index_select(1, LongTensor([1]))
anchor_w = anchor_w.repeat(bs, 1).repeat(1, 1, in_h * in_w).view(w.shape)
anchor_h = anchor_h.repeat(bs, 1).repeat(1, 1, in_h * in_w).view(h.shape)
# Add offset and scale with anchors
pred_boxes = FloatTensor(prediction[..., :4].shape)
pred_boxes[..., 0] = x.data + grid_x
pred_boxes[..., 1] = y.data + grid_y
pred_boxes[..., 2] = torch.exp(w.data) * anchor_w
pred_boxes[..., 3] = torch.exp(h.data) * anchor_h
# Results
_scale = torch.Tensor([stride_w, stride_h] * 2).type(FloatTensor)
output = torch.cat((pred_boxes.view(bs, -1, 4) * _scale,
conf.view(bs, -1, 1), pred_cls.view(bs, -1, self.num_classes)), -1)
return output.data