最近是transformer杠上了矾缓,參試了下detr的目標(biāo)檢測(cè)項(xiàng)目嗜闻,又接觸了NLP的機(jī)器翻譯,可以說transformer已經(jīng)有一統(tǒng)nlp和cv的趨勢(shì)样眠,當(dāng)然卷積神經(jīng)網(wǎng)絡(luò)的特點(diǎn)和地位還是有的檐束,目前更多是transformer和CNN的融合束倍,然后在更多領(lǐng)域應(yīng)用肌幽,同時(shí)隨著微軟亞研院的swin-transformer問世,使得傳統(tǒng)的detr訓(xùn)練速度問題得以解決格嘁,應(yīng)該說是從swin transformer 開始在cv領(lǐng)域真正的看到實(shí)用價(jià)值糕簿,更類似是yolov3對(duì)于目標(biāo)檢測(cè)的意義狡孔。
在這之前應(yīng)該更好的了解transformer的來源苗膝,其實(shí)是在NLP領(lǐng)域,這里記錄飛漿Vision Transformer的課來理解下离唐。
transformer從某個(gè)角度起的非常好就是像變形金剛一樣亥鬓,我們指定最早變形金剛其實(shí)都不會(huì)變形嵌戈,但是后來通過掃描汽車,飛機(jī)等的特征轉(zhuǎn)換成自身的架構(gòu)特點(diǎn)和變身模式宽档,從而有了變形金剛惰拱,而這樣就可以通過一種方式從任何看到的形態(tài)變成機(jī)器人形態(tài)偿短。這也真是transformer的核心理念。
當(dāng)然這里的transformer首先是在NLP領(lǐng)域降传,近年來已經(jīng)基本統(tǒng)治了自然語(yǔ)言處理的benchmark前十名
而2020年開始transformer又開始在圖像領(lǐng)域開始發(fā)熱
基本圖像分類,目標(biāo)檢測(cè)笔链,圖像分割等領(lǐng)域都是基于transformer屠榜了鉴扫。
ViT中以目標(biāo)檢測(cè)為例,最早的detr說明了ViT的可行性炕婶,Deformable和Anchor DETR等說明了有效性柠掂,而swin則更是進(jìn)一步說明transformer在各領(lǐng)域的通用性依沮,特別是視覺和多模態(tài),正如變形金剛一樣摘完,只要通過編解碼的方式可以將任何模式的數(shù)據(jù)映射到另一個(gè)模式傻谁,從而實(shí)現(xiàn)模型的通用性审磁。
transfomer也是更好的將各個(gè)模態(tài)的而數(shù)據(jù)特征映射到統(tǒng)一的空間岂座,這樣就更像我們的人類學(xué)習(xí)一樣费什,具有遷移性,就好像我們看到蘋果瘩蚪,拿一張畫或圖疹瘦,再聽語(yǔ)言說明就可以立體的學(xué)習(xí)什么是蘋果巡球,從而映射到一個(gè)概念的空間和外延等酣栈。
又比如NLP的多語(yǔ)言翻譯一樣,與其學(xué)習(xí)任意兩種語(yǔ)言轉(zhuǎn)換起便,不如轉(zhuǎn)到一個(gè)共有的語(yǔ)義空間然后轉(zhuǎn)為任何語(yǔ)言缨睡,我們?nèi)送彩侨绱藢W(xué)習(xí)的陈辱,所以外語(yǔ)學(xué)多了會(huì)觸類旁通沛贪,學(xué)別的也快震贵,就是這個(gè)道理猩系。
所以NLP和CV領(lǐng)域也逐漸像大預(yù)訓(xùn)練和遷移微調(diào)轉(zhuǎn)變寇甸,基本都是通過微調(diào)預(yù)訓(xùn)練的模型疗涉,再去解決下游任務(wù)咱扣,而不是過去一個(gè)個(gè)模型來。
學(xué)習(xí)方法很重要沪铭,transformer看起來理論比過去各種模型結(jié)構(gòu)少了很多杀怠,比如detr 硼补,但是要精通和調(diào)參卻非常難已骇,往往現(xiàn)在的模型的參數(shù)都是非常龐大的褪储。核心就是coding is all you need 實(shí)踐第一。
who am I 我是誰(shuí)
where am I 我在哪
what should I do 我要干嘛
學(xué)神經(jīng)網(wǎng)絡(luò)數(shù)據(jù)結(jié)構(gòu)也是第一個(gè)要熟悉的,看清數(shù)據(jù)怎么變換的碘橘。
廢話少說吱肌,先搭個(gè)swin來體驗(yàn)吧
部署訓(xùn)練STOD
開源地址: https://github.com/SwinTransformer/Swin-Transformer-Object-Detection
論文:https://arxiv.org/pdf/2103.14030.pdf
環(huán)境部署
swin是微軟亞洲研究院基于mmdetection目標(biāo)檢測(cè)庫(kù)搭建的氮墨,所以先構(gòu)建mmdetection鏡像
使用官方docker 文件夾里的Dockerfile 構(gòu)建鏡像 yihui8776/mmdetecion:v1
ARG PYTORCH="1.6.0"
ARG CUDA="10.1"
ARG CUDNN="7"
FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel
ENV TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0+PTX"
ENV TORCH_NVCC_FLAGS="-Xfatbin -compress-all"
ENV CMAKE_PREFIX_PATH="$(dirname $(which conda))/../"
RUN apt-get update && apt-get install -y ffmpeg libsm6 libxext6 git ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Install MMCV
#RUN pip install --default-timeout=200 mmcv-full==1.3.9 -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.6.0/index.html
RUN pip install --default-timeout=2000 mmcv-full==1.0.5 -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.6.0/index.html --trusted-host download.openmmlab.com
#COPY mmcv_full-1.3.9-cp37-cp37m-manylinux1_x86_64.whl /workspace
#RUN pip install /workspace/mmcv_full-1.3.9-cp37-cp37m-manylinux1_x86_64.whl
# Install MMDetection
RUN conda clean --all
RUN git clone https://github.com/open-mmlab/mmdetection.git /mmdetection
WORKDIR /mmdetection
ENV FORCE_CUDA="1"
RUN pip install -r requirements/build.txt
RUN pip install --no-cache-dir -e . -i https://pypi.douban.com/simple
構(gòu)建 鏡像 yihui8776/swindetr:v0.1
FROM yihui8776/mmdetection:v1
MAINTAINER yihui8776 <wangyaohui8776@sina.com>
RUN apt-get update
RUN apt-get install -y vim openssh-server && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# SSH Server
RUN sed -i 's/^\(PermitRootLogin\).*/\1 yes/g' /etc/ssh/sshd_config && \
sed -i 's/^PermitEmptyPasswords .*/PermitEmptyPasswords yes/g' /etc/ssh/sshd_config && \
echo 'root:ai1234' > /tmp/passwd && \
chpasswd < /tmp/passwd && \
rm -rf /tmp/passwd
RUN pip install jupyter -i https://pypi.doubanio.com/simple
COPY . /workspace
COPY run_jupyter.sh /
RUN chmod +x /run_jupyter.sh
WORKDIR /workspace
EXPOSE 22
EXPOSE 8888
CMD ["/run_jupyter.sh", "--allow-root"]
運(yùn)行容器
docker run --gpus '"device=1,2,3"' -itd --shm-size 12G -v /media/nizhengqi/sdf/wyh/data:/workspace/data -v /media/nizhengqi/sdf/wyh/Swin-Transformer-Object-Detection:/workspace -p 8890:8888 -p 2223:2222 --name swdetr yihui8776/swindetr:v0.1
進(jìn)入容器編譯安裝 apex
git clone https://github.com/NVIDIA/apex
cd apex
pip install -r requirements.txt
python setup.py install --cpp_ext
測(cè)試編譯mmdet如報(bào)錯(cuò)AttributeError: module ‘pycocotools’ has no attribute ‘version’則
pip uninstall pycocotools
pip install mmpycocotools
測(cè)試python demo/image_demo.py demo/demo.jpg configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py mask_rcnn_swin_tiny_patch4_window7.pth
預(yù)訓(xùn)練模型文件在GitHub可以得到
訓(xùn)練自己的數(shù)據(jù)集
數(shù)據(jù)轉(zhuǎn)換
主要使用maskrcnn訓(xùn)練温峭,需要轉(zhuǎn)為coco格式在標(biāo)注增加 segmentation數(shù)據(jù)轉(zhuǎn)換基本和detr相同字支,voc轉(zhuǎn)coco
points = [[xmin, ymin], [xmax, ymin], [xmin, ymax], [xmax, ymax]]
seg = [np.asarray(points).flatten().tolist()]
ann = {
"area": o_width * o_height,
"iscrowd": 0,
"image_id": image_id,
"bbox": [xmin, ymin, o_width, o_height],
"category_id": category_id,
"id": bnd_id,
"ignore": 0,
"segmentation": seg,
}
位置在 /media/nizhengqi/sdf/wyh/data/safehat
容器內(nèi)位于 data/safehat
python voc2coco.py xml/xml_train.py annotations/instances_train2017.json
python voc2coco.py xml/xml_val.py annotations/instances_val2017.json
圖片位置也是 data/safehat/train2017 和 data/safehat/val2017
相應(yīng)修改configs/base/datasets/coco_detection.py中數(shù)據(jù)集路徑并調(diào)整samples_per_gpu和workers_per_gpu
# 修改數(shù)據(jù)集的類型清笨,路徑
dataset_type = 'CocoDataset'
data_root = '/home/coco/'
# 修改img_size等參數(shù)刃跛,CUDA out of memory時(shí)可以修改
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
# 原本為1333*800
#dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
dict(type='Resize', img_scale=(416, 416), keep_ratio=True),
# 修改batch_size
data = dict(
samples_per_gpu=1, # 每塊GPU上的sample個(gè)數(shù)桨昙,batch_size = gpu數(shù)目*該參數(shù)
workers_per_gpu=1, # 每塊GPU上的workers的個(gè)數(shù)
# 以train為例
train=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_train2017.json', # 標(biāo)注路徑
img_prefix=data_root + 'train2017/', # 訓(xùn)練圖片路徑
pipeline=train_pipeline),
修改權(quán)重文件
主要修改類別為自己的類別數(shù) cat changeclass.py
import torch
#model_path = "E:/workspace/Swin-Transformer-Object-Detection/checkpoints/mask_rcnn_swin_tiny_patch4_window7.pth",
model_save_dir = "./"
pretrained_weights = torch.load('mask_rcnn_swin_tiny_patch4_window7.pth')
#pretrained_weights = torch.load('E:\workspace\Swin-Transformer-Object-Detection\checkpoints\cascade_mask_rcnn_swin_small_patch4_window7.pth')
num_class = 35 #實(shí)際類別數(shù)
pretrained_weights['state_dict']['roi_head.bbox_head.fc_cls.weight'].resize_(num_class + 1, 1024)
pretrained_weights['state_dict']['roi_head.bbox_head.fc_cls.bias'].resize_(num_class + 1)
pretrained_weights['state_dict']['roi_head.bbox_head.fc_reg.weight'].resize_(num_class * 4, 1024)
pretrained_weights['state_dict']['roi_head.bbox_head.fc_reg.bias'].resize_(num_class * 4)
pretrained_weights['state_dict']['roi_head.mask_head.conv_logits.weight'].resize_(num_class, 256, 1, 1)
pretrained_weights['state_dict']['roi_head.mask_head.conv_logits.bias'].resize_(num_class)
torch.save(pretrained_weights, "{}/mask_rcnn_swin_{}.pth".format(model_save_dir, num_class))
#torch.save(pretrained_weights, "{}/cascade_mask_rcnn_swin_{}.pth".format(model_save_dir, num_class))
修改configs_base_\models\mask_rcnn_swin_fpn.py中num_classes 兩個(gè)地方改為具體類別數(shù) 這里是35
修改configs_base_\default_runtime.py中interval,load_from
\checkpoint_config = dict(interval=1) # 每訓(xùn)練一個(gè)epoch,保存一次權(quán)重
# yapf:disable
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])
# yapf:enable
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
#load_from = None
load_from = "mask_rcnn_swin_35.pth" #模型文件位置 加載backbone
resume_from = None #加載繼續(xù)訓(xùn)練
workflow = [('train', 1)] #訓(xùn)練模式流程
調(diào)整代數(shù)和學(xué)習(xí)率等參數(shù)修改configs\swin\mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py中的max_epochs桂塞、lr參數(shù)
文件改為
coco_detection的base = [
'../base/models/mask_rcnn_swin_fpn.py',
'../base/datasets/coco_detection.py',
'../base/schedules/schedule_1x.py',
'../base/default_runtime.py']
data = dict(train=dict(pipeline=train_pipeline))
optimizer = dict(_delete_=True, type='AdamW', lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05,
paramwise_cfg=dict(custom_keys={'absolute_pos_embed': dict(decay_mult=0.),
'relative_position_bias_table': dict(decay_mult=0.),
'norm': dict(decay_mult=0.)}))
lr_config = dict(step=[27, 33])
runner = dict(type='EpochBasedRunnerAmp', max_epochs=36)
#不用fp16則注釋掉下面
# do not use mmdet version fp16
fp16 = None
optimizer_config = dict(
type="DistOptimizerHook",
update_interval=1,
grad_clip=None,
coalesce=True,
bucket_size_mb=-1,
use_fp16=True,
)
修改mmdet/core/evalution/class_names.py和mmdet/datasets/coco.py中的標(biāo)簽
CLASSES = ('hat',
'person',
'hand',
'insulating_gloves',
'workclothes_clothes',
'workclothes_trousers',
'winter_clothes',
'winter_trousers',
'vest',
'noworkclothes_clothes',
'noworkclothes_trousers',
'roll_workclothes',
'roll_shirts',
'roll_noworkclothes',
'shorts',
'safteybelt',
'work_men',
'stranger_men',
'down',
'smoking',
'big_smoking',
'height',
'noheight',
'holes',
'fence',
'oxygen_vertically',
'oxygen_horizontally',
def coco_classes():
return [
'hat',
'person',
'hand',
'insulating_gloves',
'workclothes_clothes',
'workclothes_trousers',
'winter_clothes',
'winter_trousers',
'vest',
'noworkclothes_clothes',
'noworkclothes_trousers',
'roll_workclothes',
'roll_shirts',
'roll_noworkclothes',
'shorts',
'safteybelt',
'work_men',
'stranger_men',
'down',
'smoking',
'big_smoking',
'height',
'noheight',
'holes',
'fence',
'oxygen_vertically',
'oxygen_horizontally',
'single_ladder',
'double_ladder',
'fire',
'gas_tank',
'extinguisher',
'groundrod',
'big_smoking',
'bottle'
]
這里 修改完需要編譯python setup.py install
不然會(huì)出現(xiàn)“AssertionError: The num_classes (20) in Shared2FCBBoxHead of MMDataParallel does not matches the length of CLASSES 80) in RepeatDataset"的報(bào)錯(cuò)
訓(xùn)練
所有修改完后可以開始訓(xùn)練python tools/train.py configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py
使用編號(hào)為3的單個(gè)gpu訓(xùn)練
python ./tools/train.py configs/swin/cascade_mask_rcnn_swin_base_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py --gpu-ids 3
使用多gpu訓(xùn)練
tools/dist_train.sh configs/swin/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py 4
訓(xùn)練Log及權(quán)重保存在"Swin-Transformer-Object-Detection-master/work_dirs/"中
測(cè)試
python tools/test.py configs/swin/mask_rcnn_swin_small_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py mask_rcnn_swin_small_patch4_window7.pth --eval segm
輸出demo汰瘫,輸出為cls,x1,y1,x2,y2的txt格式
from argparse import ArgumentParser
from mmdet.apis import inference_detector, init_detector
import numpy as np
import os
from tqdm import tqdm
def main():
parser = ArgumentParser()
parser.add_argument('--img-path', default='/data/wj/test/',help='Image file')
parser.add_argument('--config', default='../work_dirs/cascade_rcnn_x101_64x4d_fpn_20e_coco/cascade_rcnn_x101_64x4d_fpn_20e_coco.py' ,help='Config file')
parser.add_argument('--checkpoint', default='../work_dirs/cascade_rcnn_x101_64x4d_fpn_20e_coco/latest.pth', help='Checkpoint file')
parser.add_argument(
'--device', default='cuda:0', help='Device used for inference')
parser.add_argument(
'--score-thr', type=float, default=0.3, help='bbox score threshold')
args = parser.parse_args()
imgs_path = args.img_path
save_path = '../output/'
# build the model from a config file and a checkpoint file
model = init_detector(args.config, args.checkpoint, device=args.device)
for img_path in tqdm(os.listdir(imgs_path)):
img = os.path.join(imgs_path, img_path)
result = inference_detector(model, img)
bboxes = np.vstack(result)
labels = [
np.full(bbox.shape[0], i, dtype=np.int32)
for i, bbox in enumerate(result)
]
labels = np.concatenate(labels)
score_thr = args.score_thr
if score_thr > 0:
assert bboxes.shape[1] == 5
scores = bboxes[:, -1]
inds = scores > score_thr
bboxes = bboxes[inds, :]
labels = labels[inds]
if len(bboxes) == 0:
txt_path = os.path.join(save_path, '{}.txt'.format(img_path.split('.')[0]))
with open(txt_path, 'w') as f:
f.write("")
for i, (bbox, label) in enumerate(zip(bboxes, labels)):
bbox_int = bbox.astype(np.int32)
x1, y1, x2, y2, conf = bbox_int
txt_path = os.path.join(save_path, '{}.txt'.format(img_path.split('.')[0]))
with open(txt_path, 'a') as f:
f.write("{} {} {} {} {}\n".format(label, x1, y1, x2, y2))
常見問題在使用mmdetection2.0框架訓(xùn)練目標(biāo)檢測(cè)模型時(shí)候趴乡,出現(xiàn)IndexError: list index out of range錯(cuò)誤這很有可能是class數(shù)目的問題mmdetection/mmdet/datasets/coco.py:中的CLASSED變量對(duì)應(yīng)的類別是否正確
mmdetection/mmdet/core/evaluation/class_names.py:coco_classes()函數(shù)返回的類別是否正確
mmdetection/configs/base/models/mask_rcnn_swin_fpn.py:中num_classes對(duì)應(yīng)的類別數(shù)是否正確總之要反復(fù)查看蝗拿,最后要重新編譯哀托,這常被網(wǎng)上的介紹忽略了。