本項(xiàng)目地址:caffe/ssd
SSD: Single Shot MultiBox Detector
By Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg.
簡(jiǎn)介
SSD是使用單個(gè)網(wǎng)絡(luò)進(jìn)行物體檢測(cè)任務(wù)的統(tǒng)一框架. 你可以使用本代碼訓(xùn)練/評(píng)估物體檢測(cè)任務(wù). 更多細(xì)節(jié)請(qǐng)見(jiàn) arXiv paper 以及 slide.
System | VOC2007 test mAP | FPS (Titan X) | Number of Boxes | Input resolution |
---|---|---|---|---|
Faster R-CNN (VGG16) | 73.2 | 7 | ~6000 | ~1000 x 600 |
YOLO (customized) | 63.4 | 45 | 98 | 448 x 448 |
SSD300* (VGG16) | 77.2 | 46 | 8732 | 300 x 300 |
SSD512* (VGG16) | 79.8 | 19 | 24564 | 512 x 512 |
Note: SSD300* and SSD512* are the latest models. Current code should reproduce these results.
目錄
安裝
- 下載代碼衩婚。假設(shè)把Caffe克隆到目錄
$CAFFE_ROOT
下
git clone https://github.com/weiliu89/caffe.git
cd caffe
git checkout ssd
- Build 代碼. 按照 Caffe instruction 安裝
必要的packages,然后build吧趣。
# 根據(jù)Caffe安裝的方式修改Makefile.config。
cp Makefile.config.example Makefile.config
make -j8
# 確保include $CAFFE_ROOT/python到PYTHONPATH環(huán)境變量?jī)?nèi).
make py
make test -j8
# 運(yùn)行測(cè)試早处,可選
make runtest -j8
預(yù)備
下載 fully convolutional reduced (atrous) VGGNet. 假設(shè)文件被下載到了
$CAFFE_ROOT/models/VGGNet/
目錄-
下載VOC2007和VOC2012數(shù)據(jù)集.
對(duì)Pascal VOC數(shù)據(jù)集的簡(jiǎn)介:
假設(shè)下載到了$HOME/data/
目錄
# 下載數(shù)據(jù).
cd $HOME/data
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
# 解壓數(shù)據(jù).
tar -xvf VOCtrainval_11-May-2012.tar
tar -xvf VOCtrainval_06-Nov-2007.tar
tar -xvf VOCtest_06-Nov-2007.tar
VOC 2007年的數(shù)據(jù)分為VOCtrainval和VOCtest兩個(gè)tar包县踢,VOC 2012年的數(shù)據(jù)只有VOCtrainval一個(gè)tar包衔峰,如下
解壓后,2007和2012兩年的數(shù)據(jù)在VOCdevkit
目錄的VOC2007
和VOC2012
兩個(gè)子目錄中验庙。每個(gè)子目錄下,分別包含了五個(gè)文件夾社牲,分別是Annotations
ImageSets
JPEGImages
SegmentationClass
以及 SegmentationObject
粪薛。對(duì)于SSD的Object任務(wù),需要使用Annotations
中的xml標(biāo)注文件搏恤,ImagesSets/Main/
目錄中的trainval.txt
和test.txt
违寿,以及JPEGImages
目錄下的圖像湃交。
- 創(chuàng)建LMDB文件.
cd $CAFFE_ROOT
# Create the trainval.txt, test.txt, and test_name_size.txt in data/VOC0712/
./data/VOC0712/create_list.sh
# 如有必要,可以按需修改create_data.sh文件.
# 編碼trainval和test原始圖像藤巢,生成lmdb文件:
# - $HOME/data/VOCdevkit/VOC0712/lmdb/VOC0712_trainval_lmdb
# - $HOME/data/VOCdevkit/VOC0712/lmdb/VOC0712_test_lmdb
# and make soft links at examples/VOC0712/
./data/VOC0712/create_data.sh
生成的trainval.txt格式如圖搞莺,文件內(nèi)容是圖像的路徑和標(biāo)注文件的路徑,中間用空格分隔開(kāi):
生成的test_name_size.txt是測(cè)試圖像的id
height
和 width
最終掂咒,生成trainval和test兩個(gè)lmdb數(shù)據(jù)庫(kù)才沧,分別用來(lái)訓(xùn)練和測(cè)試SSD模型。
trainval LMDB
同時(shí)绍刮,在.../caffe/examples/VOC0712/
路徑下保存了上面兩個(gè)lmdb數(shù)據(jù)庫(kù)的鏈接温圆,截圖如下:
訓(xùn)練/評(píng)估
- 訓(xùn)練你自己的模型并評(píng)估.
# 創(chuàng)建模型定義文件并保存模型訓(xùn)練快照到如下路徑:
# - $CAFFE_ROOT/models/VGGNet/VOC0712/SSD_300x300/
# and job file, log file, and the python script in:
# - $CAFFE_ROOT/jobs/VGGNet/VOC0712/SSD_300x300/
# 保存當(dāng)前評(píng)估結(jié)果到:
# - $HOME/data/VOCdevkit/results/VOC2007/SSD_300x300/
# 120K次迭代之后,應(yīng)該可以達(dá)到77.*的mAP.
python examples/ssd/ssd_pascal.py
如果不樂(lè)意自己訓(xùn)練模型孩革,可以在here下載預(yù)訓(xùn)練好的模型.注意是用PASCAL VOC數(shù)據(jù)集訓(xùn)練的岁歉。
通過(guò)分析ssd_pascal.py的源碼,可以知道訓(xùn)練ssd模型需要幾個(gè)文件輸入膝蜈,分別是
train_data = "examples/VOC0712/VOC0712_trainval_lmdb"
test_data = "examples/VOC0712/VOC0712_test_lmdb"
name_size_file = "data/VOC0712/test_name_size.txt"
pretrain_model = "models/VGGNet/VGG_ILSVRC_16_layers_fc_reduced.caffemodel"
label_map_file = "data/VOC0712/labelmap_voc.prototxt"
train_net_file = "models/VGGNet/VOC0712/SSD_300x300/train.prototxt"
test_net_file = "models/VGGNet/VOC0712/SSD_300x300/test.prototxt"
deploy_net_file = "models/VGGNet/VOC0712/SSD_300x300/deploy.prototxt"
solver_file = "models/VGGNet/VOC0712/SSD_300x300/solver.prototxt"
其中锅移,train_data
和test_data
是之前創(chuàng)建的LMDB數(shù)據(jù)庫(kù)文件,用于訓(xùn)練和測(cè)試模型彬檀。name_size_file
是之前創(chuàng)建的測(cè)試圖像集的圖像id和size文件帆啃,用于模型的測(cè)試。pretrain_model
是base network部分(VGG_16的卷積層)的預(yù)訓(xùn)練參數(shù)窍帝。label_map_file
保存的是物體的name和label的映射文件努潘,用于訓(xùn)練和測(cè)試。這五個(gè)文件是之前都準(zhǔn)備好的.
后面的四個(gè)文件坤学,train_net_file
test_net_file
deploy_net_file
和solver_file
是在ssd_pascal.py
腳本中根據(jù)模型定義和訓(xùn)練策略參數(shù)自動(dòng)生成的疯坤。例如,train_net_file
深浮,也就是train.prototxt
宝穗,生成語(yǔ)句是shutil.copy(train_net_file, job_dir)
,具體的代碼片段如下:
# Create train net.
net = caffe.NetSpec()
net.data, net.label = CreateAnnotatedDataLayer(train_data, batch_size=batch_size_per_device,
train=True, output_label=True, label_map_file=label_map_file,
transform_param=train_transform_param, batch_sampler=batch_sampler)
VGGNetBody(net, from_layer='data', fully_conv=True, reduced=True, dilated=True,
dropout=False)
AddExtraLayers(net, use_batchnorm, lr_mult=lr_mult)
mbox_layers = CreateMultiBoxHead(net, data_layer='data', from_layers=mbox_source_layers,
use_batchnorm=use_batchnorm, min_sizes=min_sizes, max_sizes=max_sizes,
aspect_ratios=aspect_ratios, steps=steps, normalizations=normalizations,
num_classes=num_classes, share_location=share_location, flip=flip, clip=clip,
prior_variance=prior_variance, kernel_size=3, pad=1, lr_mult=lr_mult)
# Create the MultiBoxLossLayer.
name = "mbox_loss"
mbox_layers.append(net.label)
net[name] = L.MultiBoxLoss(*mbox_layers, multibox_loss_param=multibox_loss_param,
loss_param=loss_param, include=dict(phase=caffe_pb2.Phase.Value('TRAIN')),
propagate_down=[True, True, False, False])
with open(train_net_file, 'w') as f:
print('name: "{}_train"'.format(model_name), file=f)
print(net.to_proto(), file=f)
shutil.copy(train_net_file, job_dir)
- 使用最新模型快照評(píng)估模型.
# 如果你需要對(duì)訓(xùn)練的模型進(jìn)行評(píng)估旋膳,執(zhí)行腳本:
python examples/ssd/score_ssd_pascal.py
- 使用webcam攝像頭測(cè)試模型. 注意: 按 <kbd>esc</kbd> 停止.
# 連接webcam攝像頭和預(yù)訓(xùn)練的模型進(jìn)行演示收夸,運(yùn)行:
python examples/ssd/ssd_pascal_webcam.py
Here 展示了一個(gè)在 MSCOCO 數(shù)據(jù)集上訓(xùn)練的模型SSD500的演示視頻.
查看
examples/ssd_detect.ipynb
或者examples/ssd/ssd_detect.cpp
如何使用ssd模型檢測(cè)物體. 查看examples/ssd/plot_detections.py
如何繪制 ssd_detect.cpp的檢測(cè)結(jié)果.如果使用其他數(shù)據(jù)集訓(xùn)練, 請(qǐng)參考data/OTHERDATASET 了解更多細(xì)節(jié). 目前支持COCO 和 ILSVRC2016數(shù)據(jù)集. 建議使用
examples/ssd.ipynb
檢查新的數(shù)據(jù)集是否符合要求.
模型
在不同數(shù)據(jù)集上訓(xùn)練了模型以供下載. 為了復(fù)現(xiàn)論文Table 6中的結(jié)果, 每個(gè)模型文件夾內(nèi)都包含一個(gè).caffemodel
文件, 幾個(gè).prototxt
文件, 以及python腳本文件.
-
PASCAL VOC 模型:
-
COCO 模型:
-
ILSVRC 模型:
全新的數(shù)據(jù)集
在之前的訓(xùn)練/評(píng)估的第一部分,我們介紹了如何準(zhǔn)備數(shù)據(jù)集:
dbname_trainval_lmdb
dbname_test_lmdb
test_name_size.txt
labelmap_dbname.prototxt
VGG_ILSVRC_16_layers_fc_reduced.caffemodel
全新的數(shù)據(jù)意味著不同的訓(xùn)練/測(cè)試圖像布卡,不同的object name label映射關(guān)系雨让,不同的網(wǎng)絡(luò)模型定義參數(shù)。首先忿等,我們需要根據(jù)新的圖像數(shù)據(jù)集生成模型的輸入部分栖忠,也就是上面的五個(gè)文件。
VGG_ILSVRC_16_layers_fc_reduced.caffemodel
是預(yù)訓(xùn)練好的VGG_16的卷積層的參數(shù),直接下載使用即可庵寞,這里不再介紹如何重新訓(xùn)練VGG_16分類(lèi)模型狸相。-
labelmap_dbname.prototxt
是標(biāo)注文件中object的name和label的映射文件,一般類(lèi)別不會(huì)太多捐川,直接編寫(xiě)此文件即可脓鹃。例如,一個(gè)可能的映射文件:item { name: "none_of_the_above" label: 0 display_name: "background" } item { name: "Car" label: 1 display_name: "car" } item { name: "Bus" label: 2 display_name: "bus" } item { name: "Van" label: 3 display_name: "van" } ...
-
test_name_size.txt
文件保存了所有測(cè)試圖像的id
height
width
信息属拾,由create_list.sh
腳本完成創(chuàng)建将谊。通過(guò)分析create_list.sh
腳本可知道,該腳本共創(chuàng)建了三個(gè)txt文件渐白,分別是trainval.txt
test.txt
和dbname_name_size.txt
尊浓。-
trainval.txt
和test.txt
中,每一行保存了圖像文件的路徑和圖像標(biāo)注文件的路徑纯衍,中間以空格分開(kāi)栋齿。片段如下:
VOC2012/JPEGImages/2010_003429.jpg VOC2012/Annotations/2010_003429.xml VOC2007/JPEGImages/008716.jpg VOC2007/Annotations/008716.xml VOC2012/JPEGImages/2009_004804.jpg VOC2012/Annotations/2009_004804.xml VOC2007/JPEGImages/005293.jpg VOC2007/Annotations/005293.xml
注意,trainval中的順序是打亂的襟诸,test中的順序不必打亂瓦堵。
-
test_name_size.txt
文件是由.../caffe/get_image_size
程序生成的,其源碼位于.../caffe/tools/get_image_size.cpp
中歌亲。這段程序的作用是根據(jù)test.txt
中提供的測(cè)試圖像的路徑信息和數(shù)據(jù)集根目錄信息(兩段路徑拼合得到圖像的絕對(duì)路徑)菇用,自動(dòng)計(jì)算每張圖像的height
和width
。get_image_size.cpp
中的核心代碼段為:
// Storing to outfile boost::filesystem::path root_folder(argv[1]); std::ofstream outfile(argv[3]); if (!outfile.good()) { LOG(FATAL) << "Failed to open file: " << argv[3]; } int height, width; int count = 0; for (int line_id = 0; line_id < lines.size(); ++line_id) { boost::filesystem::path img_file = root_folder / lines[line_id].first; GetImageSize(img_file.string(), &height, &width); std::string img_name = img_file.stem().string(); if (map_name_id.size() == 0) { outfile << img_name << " " << height << " " << width << std::endl; } else { CHECK(map_name_id.find(img_name) != map_name_id.end()); int img_id = map_name_id.find(img_name)->second; outfile << img_id << " " << height << " " << width << std::endl; } if (++count % 1000 == 0) { LOG(INFO) << "Processed " << count << " files."; } } // write the last batch if (count % 1000 != 0) { LOG(INFO) << "Processed " << count << " files."; } outfile.flush(); outfile.close();
保存到
test_name_size.txt
中的內(nèi)容片段如下:000001 500 353 000002 500 335 000003 375 500 000004 406 500 000006 375 500 000008 375 500 000010 480 354
現(xiàn)在陷揪,
trainval.txt
test.txt
和test_name_size.txt
的內(nèi)容已經(jīng)很清晰了惋鸥,可以利用現(xiàn)成的代碼程序,適當(dāng)修改圖像數(shù)據(jù)集名稱(chēng)和路徑就可以創(chuàng)建這三個(gè)文件悍缠。當(dāng)然卦绣,也可以根據(jù)自己的編程喜好,重新編寫(xiě)腳本生成符合上面格式的txt文件即可飞蚓。 -
dbname_trainval_lmdb
生成該數(shù)據(jù)庫(kù)文件的程序?yàn)?code>create_data.sh滤港,其核心代碼是執(zhí)行python腳本.../caffe/scripts/create_annoset.py
,該腳本需要之前準(zhǔn)備的labelmap_dbname.prototxt
和trainval.txt
作為輸入趴拧,以及幾個(gè)可配置項(xiàng)溅漾。
.../caffe/scripts/create_annoset.py
腳本的核心代碼是執(zhí)行.../caffe/build/tools/convert_annoset
程序。labelmap_dbname.prototxt
和trainval.txt
就是為convert_annoset
程序準(zhǔn)備的著榴,其源碼在.../caffe/tools/convert_annoset.cpp
中樟凄。創(chuàng)建并寫(xiě)入數(shù)據(jù)庫(kù)的核心代碼片段如下:
// 創(chuàng)建一個(gè)新的數(shù)據(jù)庫(kù)
scoped_ptr<db::DB> db(db::GetDB(FLAGS_backend));
db->Open(argv[3], db::NEW);
scoped_ptr<db::Transaction> txn(db->NewTransaction());
// 把數(shù)據(jù)存儲(chǔ)到數(shù)據(jù)庫(kù)
std::string root_folder(argv[1]);
AnnotatedDatum anno_datum;
Datum* datum = anno_datum.mutable_datum();
int count = 0;
int data_size = 0;
bool data_size_initialized = false;
for (int line_id = 0; line_id < lines.size(); ++line_id) {
bool status = true;
std::string enc = encode_type;
if (encoded && !enc.size()) {
// Guess the encoding type from the file name
string fn = lines[line_id].first;
size_t p = fn.rfind('.');
if ( p == fn.npos )
LOG(WARNING) << "Failed to guess the encoding of '" << fn << "'";
enc = fn.substr(p);
std::transform(enc.begin(), enc.end(), enc.begin(), ::tolower);
}
filename = root_folder + lines[line_id].first;
if (anno_type == "classification") {
label = boost::get<int>(lines[line_id].second);
status = ReadImageToDatum(filename, label, resize_height, resize_width,
min_dim, max_dim, is_color, enc, datum);
} else if (anno_type == "detection") {
labelname = root_folder + boost::get<std::string>(lines[line_id].second);
status = ReadRichImageToAnnotatedDatum(filename, labelname, resize_height,
resize_width, min_dim, max_dim, is_color, enc, type, label_type,
name_to_label, &anno_datum);
anno_datum.set_type(AnnotatedDatum_AnnotationType_BBOX);
}
if (status == false) {
LOG(WARNING) << "Failed to read " << lines[line_id].first;
continue;
}
if (check_size) {
if (!data_size_initialized) {
data_size = datum->channels() * datum->height() * datum->width();
data_size_initialized = true;
} else {
const std::string& data = datum->data();
CHECK_EQ(data.size(), data_size) << "Incorrect data field size "
<< data.size();
}
}
// 序列化
string key_str = caffe::format_int(line_id, 8) + "_" + lines[line_id].first;
// 把數(shù)據(jù)Put到數(shù)據(jù)庫(kù)
string out;
CHECK(anno_datum.SerializeToString(&out));
txn->Put(key_str, out);
if (++count % 1000 == 0) {
// Commit db
txn->Commit();
txn.reset(db->NewTransaction());
LOG(INFO) << "Processed " << count << " files.";
}// end if
}//end for
// 寫(xiě)入最后一個(gè)batch的數(shù)據(jù)
if (count % 1000 != 0) {
txn->Commit();
LOG(INFO) << "Processed " << count << " files.";
}
這段代碼中最重要的一行是對(duì)ReadRichImageToAnnotatedDatum()
方法的調(diào)用,將圖像文件和標(biāo)注信息一起寫(xiě)入到了anno_datum
變量中兄渺,再序列化,提交到數(shù)據(jù)庫(kù)緩存區(qū),緩存到一定數(shù)量的記錄后一次性寫(xiě)入數(shù)據(jù)庫(kù)挂谍。
ReadRichImageToAnnotatedDatum()
方法由Caffe提供叔壤,是caffe/src/util/io.cpp中定義的一個(gè)方法,該方法及其其調(diào)用的ReadImageToDatum
方法和GetImageSize
方法源碼如下:
bool ReadImageToDatum(const string& filename, const int label,
const int height, const int width, const int min_dim, const int max_dim,
const bool is_color, const std::string & encoding, Datum* datum) {
cv::Mat cv_img = ReadImageToCVMat(filename, height, width, min_dim, max_dim,
is_color);
if (cv_img.data) {
if (encoding.size()) {
if ( (cv_img.channels() == 3) == is_color && !height && !width &&
!min_dim && !max_dim && matchExt(filename, encoding) ) {
datum->set_channels(cv_img.channels());
datum->set_height(cv_img.rows);
datum->set_width(cv_img.cols);
return ReadFileToDatum(filename, label, datum);
}
EncodeCVMatToDatum(cv_img, encoding, datum);
datum->set_label(label);
return true;
}
CVMatToDatum(cv_img, datum);
datum->set_label(label);
return true;
} else {
return false;
}
}
void GetImageSize(const string& filename, int* height, int* width) {
cv::Mat cv_img = cv::imread(filename);
if (!cv_img.data) {
LOG(ERROR) << "Could not open or find file " << filename;
return;
}
*height = cv_img.rows;
*width = cv_img.cols;
}
bool ReadRichImageToAnnotatedDatum(const string& filename,
const string& labelfile, const int height, const int width,
const int min_dim, const int max_dim, const bool is_color,
const string& encoding, const AnnotatedDatum_AnnotationType type,
const string& labeltype, const std::map<string, int>& name_to_label,
AnnotatedDatum* anno_datum) {
// Read image to datum.
bool status = ReadImageToDatum(filename, -1, height, width,
min_dim, max_dim, is_color, encoding,
anno_datum->mutable_datum());
if (status == false) {
return status;
}
anno_datum->clear_annotation_group();
if (!boost::filesystem::exists(labelfile)) {
return true;
}
switch (type) {
case AnnotatedDatum_AnnotationType_BBOX:
int ori_height, ori_width;
GetImageSize(filename, &ori_height, &ori_width);
if (labeltype == "xml") {
return ReadXMLToAnnotatedDatum(labelfile, ori_height, ori_width,
name_to_label, anno_datum);
} else if (labeltype == "json") {
return ReadJSONToAnnotatedDatum(labelfile, ori_height, ori_width,
name_to_label, anno_datum);
} else if (labeltype == "txt") {
return ReadTxtToAnnotatedDatum(labelfile, ori_height, ori_width,
anno_datum);
} else {
LOG(FATAL) << "Unknown label file type.";
return false;
}
break;
default:
LOG(FATAL) << "Unknown annotation type.";
return false;
}
}
可以看到在上面的方法中繼續(xù)調(diào)用了io.cpp
中的兩個(gè)方法ReadFileToDatum
和ReadXMLToAnnotatedDatum
口叙,分別把圖像和圖像的標(biāo)注XML寫(xiě)入到了anno_datum
中炼绘。其中,圖像保存到了anno_datum
的mutable_datum
中妄田,XML標(biāo)注信息被保存到了anno_datum
的anno_group
->anno
->bbox
中俺亮,anno_group
還保存了label
等信息。
-
dbname_test_lmdb
同4.dbname_trainval_lmdb
- 使用
examples/ssd.ipynb
核實(shí)上面生成的文件的正確性
[1]We use examples/convert_model.ipynb
to extract a VOC model from a pretrained COCO model.