目錄
深度學習框架之caffe(一) —編譯安裝
深度學習框架之caffe(二) —模型訓練和使用
深度學習框架之caffe(三) —通過NetSpec自定義網(wǎng)絡
深度學習框架之caffe(四) —可視化與參數(shù)提取
深度學習框架之caffe(五) —模型轉(zhuǎn)換至其他框架
更新 before 6.23
訓練
CAFFE_ROOT/tools目錄提供了訓練和測試等需要的一些常用操作的源碼實現(xiàn)(.cpp文件,文件名的作用一目了然)导饲,編譯過程會對這些cpp文件進行編譯浇垦,完成后,會在build/tools目錄下生成相應的可執(zhí)行應用程序兆沙,見下圖:
訓練前的數(shù)據(jù)準備
見這里訓練過程
見這里幾個文件說明
xxx_train_test_full.protxt
xxx_solver.protxt
xxx_iter_xxx.caffemodel
xxx_mean.binaryproto
xxx_mean.npy
xxx_classes.txt (注:類別名與索引號對應表欧芽,一般在進行使用python/C++進行分類時需要),如下:
aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor
- caffe目錄說明
源碼主頁
./src
./include
./docs
./python python 接口庫
./matlab
./models
./example
./scripts
./tools
注:
a. 關(guān)于執(zhí)行convert_imageset命令時所需3個文件train.txt, test.txt, val.txt的作用說明見這里
b. 所提供的帖子里的需要執(zhí)行的腳本,只是根據(jù)訓練過程的具體步驟挤悉,將相關(guān)程序的執(zhí)行通過sh腳本實現(xiàn)渐裸,如常規(guī)流程是:
轉(zhuǎn)為lmdb(convert_imageset) -> 訓練(caffe train) -> 測試(caffe test)巫湘,通過sh腳本,可簡化對相關(guān)命令的參數(shù)設置昏鹃。但這些腳本的功能并不是最好尚氛,尤其是當你進行重復訓練時,需要手動刪除lmdb轉(zhuǎn)換時創(chuàng)建的兩個目錄才能順利執(zhí)行洞渤,如果能在此基礎上將這些sh腳本合并成一個阅嘶,并能自動地刪除、創(chuàng)建某些目錄载迄,更加自動方便讯柔。
c. caffe訓練的腳本方式多種多樣,某些開源算法 如fasterRCNN护昧,deppID等也會提供python下的訓練接口腳本魂迄。本文提供的只是一種最原生的訓練方式,對于fasterRCNN的訓練惋耙,可直接采用作者提供的訓練接口捣炬,其本質(zhì)都是相通的(按順序執(zhí)行tools下的相關(guān)應用程序)。
- python使用
python調(diào)用第三方庫時绽榛,會通過在3個目錄下進行搜索(系統(tǒng)默認的第三方庫目錄/usr/lib/python2.7/dist-packages湿酸,系統(tǒng)環(huán)境變量$PYTHONPATH 以及執(zhí)行python命令的目錄,執(zhí)行python腳本是通過模塊sys獲取這些目錄并幅值給到 sys.path)灭美。因此首先要確保caffe的python庫接口(在CAFFE_ROOT/python 目錄)在python的搜索目錄下推溃,將第三方庫添加到python可搜索路徑下的簡單方式是在python腳本(即調(diào)用caffe的 .py文件)中添加命令:sys.path.insert(0, "CAFFE_ROOT/python")
- C++使用
源碼編譯完成后,新建一工程届腐,根據(jù) caffe頭文件和庫文件目錄铁坎,對工程的頭文件路徑和庫目錄進行配置。
頭文件路徑:
CAFFE_ROOT/include
CAFFE_ROOT/src
CUDA_ROOT/include
usr/include (其他依賴庫頭文件梯捕,boost厢呵,protobuf等)
庫文件路徑:
CAFFE_ROOT/build/lib
CUDA_ROOT/lib64
usr/lib (其他依賴庫庫文件目錄,boost傀顾,protobuf等)
使用
for python
import os
from functools import partial
import caffe
import cv2
import numpy as np
from synset_words import WordCode
class CnnClassify(object):
def __init__(self, path='/trainedCaffeData/',
**kwargs):
"""
:param path:
:param caffe_files:
:param imgSize:
:return:
"""
print(os.path.abspath(path))
if kwargs.get("use_gpu", False):
caffe.set_mode_gpu() # gpu or cpu
caffe.set_device(0)
else:
caffe.set_mode_cpu()
self.img_size = kwargs.get("img_size", (48, 48))
join_func = partial(os.path.join, path)
for k in ["model_file", "params_file", "mean_file", "synset_file"]:
kwargs[k] = join_func(kwargs[k])
self.net = caffe.Net(kwargs["model_file"], # defines the structure of the model
kwargs["params_file"], # contains the trained weights
caffe.TEST) # use test mode (e.g., don't perform dropout)
self.__setReadFormat(kwargs["mean_file"])
self.synset_words = WordCode(filename=kwargs["synset_file"])
def __setReadFormat(self, model_mean):
'''
:param model_mean:訓練集的均值
'''
print(self.net.blobs['data'].data.shape)
self.transformer = caffe.io.Transformer({'data': self.net.blobs['data'].data.shape})
# 加載均值文件,并計算BGR三通道的均值
mu = np.load(model_mean).mean(1).mean(1)
# 提取均值
self.transformer.set_transpose('data', (2, 0, 1))
self.transformer.set_mean('data', mu)
self.transformer.set_raw_scale('data', 255) # 圖像尺度從[0,1]歸一化為[0,255]
# swap channels from RGB to BGR
self.transformer.set_channel_swap('data', (2, 1, 0))
def predict_batch(self, img_arr): # , tableList
self.net.blobs['data'].reshape(len(img_arr), 3, self.img_size[0], self.img_size[1]) # image size is 48x48
img_inputs = np.zeros((len(img_arr), 3, self.img_size[0], self.img_size[1]))
for ind, img_data in enumerate(img_arr):
img_inputs[ind, :, :, :] = self.transformer.preprocess('data', caffe.io.load_image_arr(img_data))
self.net.blobs['data'].data[...] = img_inputs # self.transformer.preprocess('data', img_input) # read image
out = self.net.forward()
predictions = []
for i in range(0, len(img_arr)):
output_prob = out['prob'][i] # the output probability vector for the first image in the batch
pred_label = output_prob.argmax()
word = self.synset_words.getUnicode(pred_label)
predictions.append({"Label": word, "Prob": output_prob[pred_label]})
# print "識別",pred_label
return predictions # word, output_prob[pred_label]
def predict(self, img_arr):
self.net.blobs['data'].reshape(1, 3, self.img_size[0], self.img_size[1]) # image size is 48x48
img_input = self.transformer.preprocess('data', caffe.io.load_image_arr(img_arr))
self.net.blobs['data'].data[...] = img_input # self.transformer.preprocess('data', img_input) # read image
output_prob = self.net.forward()['prob'][0]
pred_label = output_prob.argmax()
word = self.synset_words.getUnicode(pred_label)
return word, output_prob[pred_label]
def testCaffeCnn():
import glob
test = CnnClassify(path='E:/TibetOCR/Models/tibet_0323/',
model_file='tibet_full_train_test.prototxt',
params_file='tibet_full_iter_2000.caffemodel',
mean_file='ocr_mean.npy',
synset_file='synsetWords_79.pkl',
use_gpu=True,
imgSize=(48, 48)
)
imageBasePath = 'E:/TibetOCR/Data/samples/*.jpg'
imageList = glob.glob(imageBasePath)
predict_labels = []
for imagefile in imageList:
# imagefile_abs = os.path.join(imageBasePath, imagefile)
im = cv2.imread(imagefile)
label = test.predict(im)
print("識別結(jié)果:{},置信概率:{}".format(label[0], label[1]))
cv2.imshow('im', im)
cv2.waitKey(0)
predict_labels.append(label)
for C++
caffe提供的C++分類接口是CAFFE_ROOT/examples/cpp_classification.cpp
自己參考已有帖子短曾,封裝的C++下Classifier類的聲明和實現(xiàn)分別如下:
//classifier.h
#pragma once
#include <algorithm>
#include <vector>
#include "caffe/caffe.hpp"
#include "caffe/util/io.hpp"
#include "caffe/blob.hpp"
#include "opencv2/opencv.hpp"
#include "boost/smart_ptr/shared_ptr.hpp"
// Caffe's required library
//#pragma comment(lib, "caffe.lib")
using namespace boost;
using namespace caffe;
/* Pair (label, confidence) representing a prediction. */
typedef std::pair<std::string, float> Prediction;
//#define CPU_ONLY //僅在CPU上運行程序
class Classifier
{
public:
Classifier();
Classifier(const std::string& model_file,
const std::string& trained_file,
const std::string& mean_file,
const std::string& label_file);
~Classifier();
//string classFaces(Rect face, Mat frame, int *w, string name);
int LoadModelFile(std::string caffePath);
Prediction Classify(const cv::Mat& img);
std::vector<Prediction> ClassifyBatch(std::vector< cv::Mat>& img_batch);
private:
void SetMean(const std::string& mean_file);
int InitCaffeNet();
std::vector<float> Predict(const cv::Mat& img);
void WrapInputLayer(std::vector<cv::Mat>* input_channels);
void Preprocess(const cv::Mat& img,
std::vector<cv::Mat>* input_channels);
std::string model_file_;
std::string trained_file_;
std::string mean_file_;
std::string label_file_;
boost::shared_ptr<Net<float> > net_;
cv::Size input_geometry_;
int num_channels_;
cv::Mat mean_;
std::vector<string> labels_;
};
//classifier.cpp
#include "include/Classifier.h"
#include <iomanip>
#include <algorithm>
#include <time.h>
using namespace caffe;
/* Return the indices of the top N values of vector v. */
int Argmax(std::vector<float>& v) {
std::vector<float>::iterator biggest = std::max_element(v.begin(), v.end());
return std::distance(v.begin(), biggest);
}
void imagePadding(cv::Mat src, cv::Mat &dst)
{
int maxEdge = MAX(src.cols, src.rows);
int paddingWidth = abs(src.cols - src.rows);
int extraPaddingWidth = MIN(src.cols, src.rows) / 2;
int xPaddingWidth = abs(src.cols - maxEdge) / 2 + extraPaddingWidth;
int yPaddingWidth = abs(src.rows - maxEdge) / 2 + extraPaddingWidth;
copyMakeBorder(src.clone(), dst, yPaddingWidth, yPaddingWidth, xPaddingWidth, xPaddingWidth, cv::BORDER_CONSTANT, cv::Scalar(255, 255, 255));
//imshow("src", src);
//imshow("dst", dst);
//waitKey(0);
}
Classifier::~Classifier(){ }
Classifier::Classifier(){ }
int Classifier::LoadModelFile(std::string caffePath)
{
model_file_ = caffePath + "tibet_full_train_test.prototxt";
trained_file_ = caffePath + "tibet_full.caffemodel";
mean_file_ = caffePath + "Tibet_mean.binaryproto";
label_file_ = caffePath + "synsetWords.txt";
if (InitCaffeNet())//文件都存在寒砖,返回1,否則返回0
return 1;
}
int Classifier::InitCaffeNet()
{
#ifdef CPU_ONLY
Caffe::set_mode(Caffe::CPU);
#else
Caffe::set_mode(Caffe::GPU);
#endif
/* Load the network. */
net_.reset(new Net<float>(model_file_, TEST));
net_->CopyTrainedLayersFrom(trained_file_);
CHECK_EQ(net_->num_inputs(), 1) << "Network should have exactly one input.";
CHECK_EQ(net_->num_outputs(), 1) << "Network should have exactly one output.";
Blob<float>* input_layer = net_->input_blobs()[0];
int num_inputs = net_->num_inputs();
int num_outputs = net_->num_outputs();
num_channels_ = input_layer->channels();
CHECK(num_channels_ == 3 || num_channels_ == 1) << "Input layer should have 1 or 3 channels.";
input_geometry_ = cv::Size(input_layer->width(), input_layer->height());
/* Load the binaryproto mean file. */
SetMean(mean_file_);
/* Load labels. */
std::ifstream labels(label_file_.c_str());
CHECK(labels) << "Unable to open labels file " << label_file_;
string line;
while (std::getline(labels, line))
labels_.push_back(string(line));
Blob<float>* output_layer = net_->output_blobs()[0];
CHECK_EQ(labels_.size(), output_layer->channels())
<< "Number of labels is different from the output layer dimension.";
return 1;
}
Classifier::Classifier(const std::string& model_file,
const std::string& trained_file,
const std::string& mean_file,
const std::string& label_file)
{
model_file_ = model_file;
trained_file_ = trained_file;
mean_file_ = mean_file;
label_file_ = label_file;
InitCaffeNet();
}
static bool PairCompare(const std::pair<float, int>& lhs,
const std::pair<float, int>& rhs)
{
return lhs.first > rhs.first;
}
/* Return the top N predictions. */
Prediction Classifier::Classify(const cv::Mat& img) {
std::vector<float> output = Predict(img);
int maxIdx = Argmax(output);
//std::cout << labels_[maxIdx] << "prob:" << output[maxIdx] << std::endl;
return std::make_pair(labels_[maxIdx],output[maxIdx]);
//stringstream stream;
//stream << maxIdx;
//return std::make_pair(stream.str(), output[maxIdx]);
}
/* Load the mean file in binaryproto format. */
void Classifier::SetMean(const std::string& mean_file) {
Blob<float> mean_blob;
BlobProto blob_proto;
float *mean_ptr;
unsigned int num_pixel;
bool succeed = ReadProtoFromBinaryFile(mean_file, &blob_proto);
if (succeed)
{
mean_blob.FromProto(blob_proto);
CHECK_EQ(mean_blob.channels(), num_channels_)
<< "Number of channels of mean file doesn't match input layer.";
num_pixel = mean_blob.count(); /* NCHW=1x3x256x256=196608 */
//mean_ptr = (float *)mean_blob.cpu_data();
mean_ptr = mean_blob.mutable_cpu_data();
/* The format of the mean file is planar 32-bit float BGR or grayscale. */
std::vector<cv::Mat> channels;
for (int i = 0; i < num_channels_; ++i)
{
/* Extract an individual channel. */
cv::Mat channel(mean_blob.height(), mean_blob.width(), CV_32FC1, mean_ptr);
//cv::Mat channel(mean_blob.height(), mean_blob.width(), CV_32FC1);
//memcpy(channel.data, data, mean_blob.width()*mean_blob.height()*sizeof(float));
channels.push_back(channel);
//imshow("img", channel);
//waitKey(0);
mean_ptr += mean_blob.height() * mean_blob.width();
}
/* Merge the separate channels into a single image. */
//cv::Mat mean(mean_blob.height(), mean_blob.width(), CV_32FC1);//;//
cv::Mat mean;
cv::merge(channels, mean);
/* Compute the global mean pixel value and create a mean image
* filled with this value. */
cv::Scalar channel_mean = cv::mean(mean);//mean);//channels[0]
mean_ = cv::Mat(input_geometry_, mean.type(), channel_mean);
//imshow("img1", mean_);
//waitKey(0);
}
}
std::vector<float> Classifier::Predict(const cv::Mat& img)
{
Blob<float>* input_layer = net_->input_blobs()[0];
input_layer->Reshape(1, num_channels_,input_geometry_.height, input_geometry_.width);
/* Forward dimension change to all layers. */
net_->Reshape();
std::vector<cv::Mat> input_channels;
WrapInputLayer(&input_channels);
Preprocess(img, &input_channels);
net_->Forward(0);
Blob<float>* output_layer = net_->output_blobs()[0];
const float* begin = output_layer->cpu_data();
const float* end = begin + output_layer->channels();
return std::vector<float>(begin, end);
}
std::vector<Prediction> Classifier::ClassifyBatch(std::vector< cv::Mat>& img_batch)
{
Blob<float>* input_layer = net_->input_blobs()[0];
input_layer->Reshape(img_batch.size(), num_channels_, input_geometry_.height, input_geometry_.width);
/* Forward dimension change to all layers. */
net_->Reshape();
std::vector<cv::Mat> input_data;
WrapInputLayer(&input_data);
//clock_t st_tm = clock();
std::vector<cv::Mat>::iterator it = input_data.begin();
for (int i = 0; i < img_batch.size(); i++)
{
std::vector<cv::Mat>tmp_channls(3);
tmp_channls.assign(input_data.begin() + i*num_channels_, input_data.begin() + (i + 1)*num_channels_);
Preprocess(img_batch[i], &tmp_channls);
}
//std::cout << "do imgPreprocess cost time : " << (double)(clock() - st_tm) / CLOCKS_PER_SEC << std::endl;
net_->Forward(0);
Blob<float>* output_layer = net_->output_blobs()[0];
std::vector<Prediction>predictions;
/* Copy the output layer to a std::vector */
for (int i = 0; i < img_batch.size(); i++)
{
const float* begin = output_layer->cpu_data()+i*output_layer->channels();
const float* end = begin + output_layer->channels();
std::vector<float> output = std::vector<float>(begin, end);
int maxIdx = Argmax(output);
//std::cout << labels_[maxIdx] << "prob:" << output[maxIdx] << std::endl;
predictions.push_back(std::make_pair(labels_[maxIdx], output[maxIdx]));
}
return predictions;
}
/* Wrap the input layer of the network in separate cv::Mat objects
* (one per channel). This way we save one memcpy operation and we
* don't need to rely on cudaMemcpy2D. The last preprocessing
* operation will write the separate channels directly to the input
* layer. */
void Classifier::WrapInputLayer(std::vector<cv::Mat>* input_channels) {
Blob<float>* input_layer = net_->input_blobs()[0];
int width = input_layer->width();
int height = input_layer->height();
float* input_data = input_layer->mutable_cpu_data();
for (int j = 0; j < input_layer->num(); j++)
{
for (int i = 0; i < input_layer->channels(); ++i) {
cv::Mat channel(height, width, CV_32FC1, input_data);
input_channels->push_back(channel);
input_data += width * height;
}
}
}
void Classifier::Preprocess(const cv::Mat& img,
std::vector<cv::Mat>* input_channels) {
/* Convert the input image to the input image format of the network. */
cv::Mat img_padded=img;
//imagePadding(img, img_padded);
cv::Mat sample;
if (img_padded.channels() == 3 && num_channels_ == 1)
cv::cvtColor(img_padded, sample, cv::COLOR_BGR2GRAY);
else if (img_padded.channels() == 4 && num_channels_ == 1)
cv::cvtColor(img_padded, sample, cv::COLOR_BGRA2GRAY);
else if (img_padded.channels() == 4 && num_channels_ == 3)
cv::cvtColor(img_padded, sample, cv::COLOR_BGRA2BGR);
else if (img_padded.channels() == 1 && num_channels_ == 3)
cv::cvtColor(img_padded, sample, cv::COLOR_GRAY2BGR);
else
sample = img_padded;
cv::Mat sample_resized;
if (sample.size() != input_geometry_)
cv::resize(sample, sample_resized, input_geometry_);
else
sample_resized = sample;
cv::Mat sample_float;
if (num_channels_ == 3)
sample_resized.convertTo(sample_float, CV_32FC3);
else
sample_resized.convertTo(sample_float, CV_32FC1);
cv::Mat sample_normalized;
cv::subtract(sample_float, mean_, sample_normalized);
/* This operation will write the separate BGR planes directly to the
* input layer of the network because it is wrapped by the cv::Mat
* objects in input_channels. */
cv::split(sample_normalized, *input_channels);
//CHECK(reinterpret_cast<float*>(input_channels->at(0).data)
// == net_->input_blobs()[0]->cpu_data())
// << "Input channels are not wrapping the input layer of the network.";
}
使用時嫉拐,在自己的工程中將頭文件classifier.h包含進去哩都,即可在調(diào)用處實例化一個類對像,并調(diào)用Classify方法即可婉徘。
在你自己的工程中可能出現(xiàn)的問題(windows上很可能出現(xiàn)):
F0519 14:54:12.494139 14504 layer_factory.hpp:77] Check failed: registry.count(t ype) == 1 (0 vs. 1) Unknown layer type: Convolution (known types: MemoryData)
這里提供一種辦法漠嵌,是再創(chuàng)建一個頭文件(cafferegister.h)咐汞,將未知類型的層聲明或注冊即可,代碼如下:
#ifndef CAFFEREGISTER_H
#define CAFFEREGISTRE_H
#include "caffe/common.hpp"
#include "caffe/layers/data_layer.hpp"
#include "caffe/layers/input_layer.hpp"
#include "caffe/layers/inner_product_layer.hpp"
#include "caffe/layers/conv_layer.hpp"
#include "caffe/layers/relu_layer.hpp"
#include "caffe/layers/pooling_layer.hpp"
#include "caffe/layers/softmax_layer.hpp"
#include "caffe/layers/lrn_layer.hpp"
#include "caffe/layers/dropout_layer.hpp"
namespace caffe
{
extern INSTANTIATE_CLASS(DataLayer);
//REGISTER_LAYER_CLASS(Data);
extern INSTANTIATE_CLASS(InputLayer);
//REGISTER_LAYER_CLASS(Input);
extern INSTANTIATE_CLASS(InnerProductLayer);
extern INSTANTIATE_CLASS(DropoutLayer);
//REGISTER_LAYER_CLASS(Dropout);
extern INSTANTIATE_CLASS(ConvolutionLayer);
extern INSTANTIATE_CLASS(ReLULayer);
extern INSTANTIATE_CLASS(PoolingLayer);
extern INSTANTIATE_CLASS(LRNLayer);
extern INSTANTIATE_CLASS(SoftmaxLayer);
#ifdef WINDOWS
REGISTER_LAYER_CLASS(Convolution);
REGISTER_LAYER_CLASS(ReLU);
REGISTER_LAYER_CLASS(Pooling);
REGISTER_LAYER_CLASS(Softmax);
REGISTER_LAYER_CLASS(LRN);
#endif
}
#endif