本文以 Faster R-CNN
為例奸笤,介紹如何修改 MMDetection v2
的配置文件晌梨,來訓(xùn)練 VOC
格式的自定義數(shù)據(jù)集中跌。
2021.9.1 更新:適配 MMDetection v2.16
目錄:
- MMDetection v2 目標(biāo)檢測(1):環(huán)境搭建
- MMDetection v2 目標(biāo)檢測(2):數(shù)據(jù)準(zhǔn)備
- MMDetection v2 目標(biāo)檢測(3):配置修改
- MMDetection v2 目標(biāo)檢測(4):模型訓(xùn)練和測試
服務(wù)器的環(huán)境配置:
-
Ubuntu
:18.04.5 -
CUDA
:10.1.243 -
Python
:3.7.9 -
PyTorch
:1.5.1 -
MMDetection
:2.16.0
1 修改基礎(chǔ)配置
./configs/_base_
的目錄結(jié)構(gòu):
_base_
├─ datasets
├─ models
├─ schedules
└─ default_runtime.py
可以看出宦焦,包含四類配置:
-
datasets
:定義數(shù)據(jù)集 -
models
:定義模型架構(gòu) -
schedules
:定義訓(xùn)練計(jì)劃 -
default_runtime.py
:定義運(yùn)行信息
打開 ./configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py
:
_base_ = [
'../_base_/models/faster_rcnn_r50_fpn.py',
'../_base_/datasets/coco_detection.py',
'../_base_/schedules/schedule_1x.py',
'../_base_/default_runtime.py'
]
修改數(shù)據(jù)集配置的路徑:
_base_ = [
'../_base_/models/faster_rcnn_r50_fpn.py',
'../_base_/datasets/voc0712.py',
'../_base_/schedules/schedule_1x.py',
'../_base_/default_runtime.py'
]
2 修改數(shù)據(jù)集配置
打開 ./configs/_base_/datasets/voc0712.py
:
dataset_type = 'VOCDataset'
data_root = 'data/VOCdevkit/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1000, 600),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
type='RepeatDataset',
times=3,
dataset=dict(
type=dataset_type,
ann_file=[
data_root + 'VOC2007/ImageSets/Main/trainval.txt',
data_root + 'VOC2012/ImageSets/Main/trainval.txt'
],
img_prefix=[data_root + 'VOC2007/', data_root + 'VOC2012/'],
pipeline=train_pipeline)),
val=dict(
type=dataset_type,
ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
img_prefix=data_root + 'VOC2007/',
pipeline=test_pipeline),
test=dict(
type=dataset_type,
ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
img_prefix=data_root + 'VOC2007/',
pipeline=test_pipeline))
evaluation = dict(interval=1, metric='mAP') # epoch
- 修改數(shù)據(jù)集的路徑
data_root
发钝、ann_file
、img_prefix
波闹,重復(fù)次數(shù)times
笼平,并添加標(biāo)簽類別classes
:
dataset_type = 'VOCDataset'
data_root = 'data/VOCdevkit/MyDataset/'
classes = ('car', 'pedestrian', 'cyclist')
data = dict(
train=dict(
type=dataset_type,
ann_file=data_root + 'ImageSets/Main/train.txt',
img_prefix=data_root,
pipeline=train_pipeline,
classes=classes),
val=dict(
type=dataset_type,
ann_file=data_root + 'ImageSets/Main/val.txt',
img_prefix=data_root,
pipeline=test_pipeline,
classes=classes),
test=dict(
type=dataset_type,
ann_file=data_root + 'ImageSets/Main/test.txt',
img_prefix=data_root,
pipeline=test_pipeline,
classes=classes))
Tips:
data_root
中的MyDataset
可改為任意自定義數(shù)據(jù)集的名字。
- 添加圖像增強(qiáng)方式舔痪,并修改圖像縮放比例
img_scale
:
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(
type='AutoAugment',
policies=[
[dict(
type='Rotate',
level=5,
img_fill_val=(124, 116, 104),
prob=0.5,
scale=1)
],
[dict(type='Rotate', level=7, img_fill_val=(124, 116, 104)),
dict(
type='Translate',
level=5,
prob=0.5,
img_fill_val=(124, 116, 104))
],
]),
# 單尺度
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
# 多尺度
```
dict(
type='Resize',
img_scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736),
(1333, 768), (1333, 800)],
multiscale_mode="value",
keep_ratio=True),
```
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img']),
])
]
Tips:
如果
img_scale
是一個(gè)浮點(diǎn)數(shù)寓调,則直接作為縮放比例。
如果
img_scale
是一對整數(shù)锄码,則需要根據(jù)長短邊計(jì)算縮放比例夺英。
這樣可確保縮放后的長短邊滋捶,均不超過設(shè)置的尺寸痛悯。
之后再根據(jù)縮放比例,調(diào)整圖像尺寸重窟。
如果設(shè)置多組值载萌,可實(shí)現(xiàn)多尺度訓(xùn)練。
注意:
官方文檔建議將test_pipeline
中的ImageToTensor
替換為DefaultFormatBundle
巡扇。
3 修改模型架構(gòu)配置
打開 ./configs/_base_/models/faster_rcnn_r50_fpn.py
:
model = dict(
type='FasterRCNN',
pretrained='torchvision://resnet50',
roi_head=dict(
type='StandardRoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80, # 改為自定義數(shù)據(jù)集的類別個(gè)數(shù)
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0))))
修改 roi_head
的類別個(gè)數(shù) num_classes
:
model = dict(
type='FasterRCNN',
pretrained='torchvision://resnet50',
roi_head=dict(
type='StandardRoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=3,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0))))
注意:
MMDetection 2.0
后的版本扭仁,類別個(gè)數(shù)不需要加1
。
4 修改訓(xùn)練計(jì)劃配置
打開 ./configs/_base_/schedules/schedule_1x.py
:
# optimizer
optimizer = dict(
type='SGD', # 可設(shè)為 'SGD', 'Adadelta', 'Adagrad', 'Adam', 'RMSprop' 等
lr=0.02,
momentum=0.9,
weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step', # 可設(shè)為 'step', 'cyclic', 'poly', 'ConsineAnnealing' 等
warmup='linear', # 可設(shè)為 'constant', 'linear', 'exp', 'None'
warmup_iters=500,
warmup_ratio=0.001,
step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
修改學(xué)習(xí)率 lr
和迭代輪數(shù) total_epochs
:
# optimizer
optimizer = dict(
type='SGD',
lr=0.02 / 8,
momentum=0.9,
weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[7])
runner = dict(max_epochs=8)
Tips:
Faster R-CNN
的默認(rèn)學(xué)習(xí)率lr=0.02
對應(yīng)批大小batch_size=16
厅翔。
因此需要根據(jù)實(shí)際情況乖坠,按比例縮放學(xué)習(xí)率。
5 修改運(yùn)行信息配置
打開 ./configs/_base_/default_runtime.py
:
checkpoint_config = dict(interval=1) # epoch
# yapf:disable
log_config = dict(
interval=50, # iteration
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])
# yapf:enable
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)] # 也可設(shè)為 [('train', 1), ('val', 1)]
修改 log_config
的日志記錄間隔 interval
刀闷,并開啟 TensorBoard
記錄器:
log_config = dict(
interval=100,
hooks=[
dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')
])
6 創(chuàng)建自定義配置
另外熊泵,也可以將上面步驟 1-5
修改的配置寫在一個(gè)文件中。
這樣就能夠更方便地管理不同的配置文件甸昏,避免因頻繁修改導(dǎo)致出錯(cuò)顽分。
- 打開
configs
目錄:
cd configs
- 新建自定義配置目錄:
mkdir myconfig
- 在
./myconfig
目錄下,新建faster_rcnn_r50_fpn_1x_mydataset.py
:
# 修改基礎(chǔ)配置
_base_ = [
'../_base_/models/faster_rcnn_r50_fpn.py',
'../_base_/datasets/voc0712.py',
'../_base_/schedules/schedule_1x.py',
'../_base_/default_runtime.py'
]
# 修改數(shù)據(jù)集配置
dataset_type = 'VOCDataset'
data_root = 'data/VOCdevkit/MyDataset/'
classes = ('car', 'pedestrian', 'cyclist')
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(
type='AutoAugment',
policies=[
[dict(
type='Rotate',
level=5,
img_fill_val=(124, 116, 104),
prob=0.5,
scale=1)
],
[dict(type='Rotate', level=7, img_fill_val=(124, 116, 104)),
dict(
type='Translate',
level=5,
prob=0.5,
img_fill_val=(124, 116, 104))
],
]),
# 單尺度
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
# 多尺度
```
dict(
type='Resize',
img_scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736),
(1333, 768), (1333, 800)],
multiscale_mode="value",
keep_ratio=True),
```
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img']),
])
]
data = dict(
train=dict(
type=dataset_type,
ann_file=data_root + 'ImageSets/Main/train.txt',
img_prefix=data_root,
pipeline=train_pipeline,
classes=classes),
val=dict(
type=dataset_type,
ann_file=data_root + 'ImageSets/Main/val.txt',
img_prefix=data_root,
pipeline=test_pipeline,
classes=classes),
test=dict(
type=dataset_type,
ann_file=data_root + 'ImageSets/Main/test.txt',
img_prefix=data_root,
pipeline=test_pipeline,
classes=classes))
# 修改模型架構(gòu)配置
model = dict(
roi_head=dict(
bbox_head=dict(num_classes=3)))
# 修改訓(xùn)練計(jì)劃配置
# optimizer
optimizer = dict(
type='SGD',
lr=0.02 / 8,
momentum=0.9,
weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[7])
runner = dict(max_epochs=8)
# 修改運(yùn)行信息配置
checkpoint_config = dict(interval=1)
log_config = dict(
interval=100,
hooks=[
dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')
])
evaluation = dict(interval=1, metric='mAP')
7 修改其他信息
在訓(xùn)練和測試時(shí)施蜜,遇到的一些容易報(bào)錯(cuò)的地方卒蘸,這里做下記錄。
7.1 標(biāo)簽類別
- 打開
./mmdet/datasets/voc.py
修改 VOCDataset()
的標(biāo)簽類別 CLASSES
:
class VOCDataset(XMLDataset):
CLASSES = ('car', 'pedestrian', 'cyclist')
```
CLASSES = ('aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car',
'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train',
'tvmonitor')
```
- 打開
./mmdet/core/evaluation/class_names.py
修改 voc_classes()
返回的標(biāo)簽類別:
def voc_classes():
return [
'car', 'pedestrian', 'cyclist'
]
```
return [
'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat',
'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person',
'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'
]
```
注意:
以上代碼花墩,如果只有一個(gè)類別悬秉,需要在類別后加一個(gè)逗號澄步,否則會報(bào)錯(cuò)。
7.2 其他信息
- 打開
./mmdet/datasets/voc.py
如果是自定義數(shù)據(jù)集的名字和泌,需要注釋報(bào)錯(cuò)信息 ValueError
村缸,并將 self.year
設(shè)為 None
:
class VOCDataset(XMLDataset):
def __init__(self, **kwargs):
super(VOCDataset, self).__init__(**kwargs)
if 'VOC2007' in self.img_prefix:
self.year = 2007
elif 'VOC2012' in self.img_prefix:
self.year = 2012
else:
self.year = None
# raise ValueError('Cannot infer dataset year from img_prefix')
Tips:
這里的年份主要是為了區(qū)分計(jì)算AP
時(shí)采用的標(biāo)準(zhǔn)。
VOC2007
采用11 points
計(jì)算(即:[0, 0.1, ..., 1])武氓,而其他數(shù)據(jù)集則采用AUC
計(jì)算梯皿。
- 打開
./mmdet/datasets/xml_style.py
如果圖像文件不是 jpg
格式,需要將 filename
和 img_path
的后綴名改為相應(yīng)格式:
def load_annotations(self, ann_file):
data_infos = []
img_ids = mmcv.list_from_file(ann_file)
for img_id in img_ids:
# filename = f'JPEGImages/{img_id}.jpg'
filename = f'JPEGImages/{img_id}.png'
xml_path = osp.join(self.img_prefix, 'Annotations',
f'{img_id}.xml')
tree = ET.parse(xml_path)
root = tree.getroot()
size = root.find('size')
width = 0
height = 0
if size is not None:
width = int(size.find('width').text)
height = int(size.find('height').text)
else:
# img_path = osp.join(self.img_prefix, 'JPEGImages',
# '{}.jpg'.format(img_id))
img_path = osp.join(self.img_prefix, 'JPEGImages',
'{}.png'.format(img_id))
img = Image.open(img_path)
width, height = img.size
data_infos.append(
dict(id=img_id, filename=filename, width=width, height=height))
return data_infos
如果標(biāo)注文件中不存在 difficult
標(biāo)簽县恕,需要將 difficult
設(shè)為 0
:
def get_ann_info(self, idx):
img_id = self.data_infos[idx]['id']
xml_path = osp.join(self.img_prefix, 'Annotations', f'{img_id}.xml')
tree = ET.parse(xml_path)
root = tree.getroot()
for obj in root.findall('object'):
name = obj.find('name').text
if name not in self.CLASSES:
continue
label = self.cat2label[name]
# difficult = int(obj.find('difficult').text)
try:
difficult = int(obj.find('difficult').text)
except AttributeError:
difficult = 0
注意:
目前最新的版本东羹,已經(jīng)修復(fù)了這個(gè)問題,可以忽略忠烛。
- 打開
./tools/robustness_eval.py
將 results
中的 20
改為自定義數(shù)據(jù)集的類別個(gè)數(shù):
def get_voc_style_results(filename, prints='mPC', aggregate='benchmark'):
eval_output = mmcv.load(filename)
num_distortions = len(list(eval_output.keys()))
# results = np.zeros((num_distortions, 6, 20), dtype='float32')
results = np.zeros((num_distortions, 6, 3), dtype='float32')
8 結(jié)語
有幫助的話属提,點(diǎn)個(gè)贊再走吧,謝謝~
參考: