前言
由于研究內容需要晦毙,需要在python項目使用到SVM模型来惧。目前大量庫都提供了SVM模型暇赤,比如libsvm心例、opencv ml模塊等。 但是在項目中翎卓,第三方采用了Matlab的 lv_svm, 為了保持一致,只能在工程中引入vl_svm契邀。
開發(fā)測試環(huán)境
- windows10 64bit
- Anaconda 3, with python 3.7
- pybind11
- vl_feat
Aboult VLFeat
The VLFeat open source library implements popular computer vision algorithms specializing in image understanding and local features extraction and matching. Algorithms include Fisher Vector, VLAD, SIFT, MSER, k-means, hierarchical k-means, agglomerative information bottleneck, SLIC superpixels, quick shift superpixels, large scale SVM training, and many others. It is written in C for efficiency and compatibility, with interfaces in MATLAB for ease of use, and detailed documentation throughout. It supports Windows, Mac OS X, and Linux. The latest version of VLFeat is
0.9.21
.
VLFeat是一個開源的庫,包含許多流行的計算視覺算法失暴,比如圖像理解識別坯门、局部特征提取、匹配等逗扒。
VLFeat采用C語言實現古戴, 目前有Matlab接口以及文檔,但是沒有python接口矩肩,沒有python接口那就自己實現O帜铡!黍檩!
由于VLFeat庫算法太多叉袍,而且工程目前只需要SVM、HOG刽酱、LBP提取算法的python接口喳逛,因此本文主要SVM的python接口。
測試SVM
vlfeat提供了比較詳細的document和API reference
除此之外棵里,還提供了一個簡單的例子润文,關于訓練SVM
C語言代碼
- 訓練部分
輸入: 4個2維的樣本, 以及標簽y, -1代表負樣本殿怜, +1代表正樣本
輸出: SVM訓練的參數典蝌,權重model, 偏置bias
#include <stdio.h>
#include <vl/svm.h>
int main()
{
vl_size const numData = 4 ;
vl_size const dimension = 2 ;
double x [dimension * numData] = {
0.0, -0.5,
0.6, -0.3,
0.0, 0.5
0.6, 0.0} ;
double y [numData] = {1, 1, -1, 1} ;
double lambda = 0.01;
double * const model ;
double bias ;
VlSvm * svm = vl_svm_new(VlSvmSolverSgd,
x, dimension, numData,
y,
lambda) ;
vl_svm_train(svm) ;
model = vl_svm_get_model(svm) ;
bias = vl_svm_get_bias(svm) ;
printf("model w = [ %f , %f ] , bias b = %f \n",
model[0],
model[1],
bias);
vl_svm_delete(svm) ;
return 0;
}
- 測試部分
測試的代碼非常簡單, 就是輸入一個樣本头谜,得出其判別輸出骏掀。
double svm_test(double* svm_w, double svm_b, double* inputData, int dims) {
double result = 0;
for (int i = 0; i < dims; i++)
{
result += svm_w[i] * inputData[i];
}
result += svm_b;
return result;
}
封裝python接口
封裝接口主要把輸入、輸出數據搞定即可柱告,其他部分直接代碼重用即可截驮。
svm訓練函數:
- 輸入數據的轉換
語言類型 | 描述 | 參數類型 |
---|---|---|
C | 多個輸入樣本(向量形式) | double數組/指針 |
python | 多個輸入樣本(向量形式) | list |
語言類型 | 描述 | 參數類型 |
---|---|---|
C | 多個標簽 | double數組 |
python | 多個標簽 | list |
- 輸出數據的轉換
語言類型 | 描述 | 參數類型 |
---|---|---|
C | 一個權值向量(向量形式) ,一個偏置bias | double數組/指針 + double數 |
python | 一個權值向量(向量形式) 末荐,一個偏置bias | list侧纯, 包含2項 |
pytho接口代碼實現
#include<vector>
#include<pybind11/pybind11.h>
#include<pybind11/stl.h>
#include<pybind11/numpy.h>
#include"svm_classifier.h"
namespace py = pybind11;
std::vector<std::vector<double>> train_svm(std::vector<std::vector<double>>& trainData, std::vector<double>& labels, double lambda) {
std::vector<double> weights;
double bias;
int numData = trainData.size();
int dims = trainData.at(0).size();
double* x = new double[dims*numData];
double* y = new double[numData];
int cnt = 0;
for (int i = 0; i < numData; i++)
{
for (int j = 0; j < dims; j++)
{
x[cnt] = trainData[i][j];
cnt++;
}
y[i] = labels[i];
}
VlSvm* svm = vl_svm_new(VlSvmSolverSgd, x, dims, numData, y, lambda);
vl_svm_train(svm);
const double * model = vl_svm_get_model(svm);
bias = vl_svm_get_bias(svm);
for (int i = 0; i < dims; i++)
{
weights.push_back(model[i]);
}
vl_svm_delete(svm);
return std::vector<std::vector<double>>{weights, { bias }};
}
PYBIND11_MODULE(vlfeat_svm, m) {
m.doc() = "Simple svm demo!";
m.def("train_svm", &train_svm, py::arg("train_dataset"), py::arg("labels"), py::arg("lambda_value"));
}
編譯直接生成.pyd動態(tài)鏈接庫
python接口的測試
為了符合OO面向對象, 將代碼封裝到類中:
首先構造一個類SVM甲脏, 包含2個方法:
- train() 訓練
- eval() 測試
import numpy as np
import detector.svm.vlfeat_svm as svm
import random
class SVM:
def __init__(self):
self.name = 'svm'
self.weights = []
self.bias = 0.0
self.lambda_value = 0.0
self.optim_method = 'SGD'
def train(self, train_datas, labels, lambda_value):
self.lambda_value = lambda_value
self.weights, bias_ = svm.train_svm(train_datas, labels, lambda_value)
self.bias = bias_[0]
def eval(self, sample):
if len(sample) != len(self.weights):
raise ValueError
value = np.sum(np.array(self.weights) * np.array(sample))+self.bias
return value
在此基礎上眶熬,在定義一個派生類 ClassifierSVM:
- setPSamples() 設置正樣本訓練集
- setNSamples() 設置負樣本訓練集
- setLabel() 設置訓練標簽
- train() 訓練妹笆, overwrite父類方法
- eval() 測試, 繼承父類
class ClassifierSVM(SVM):
def __init__(self):
super(ClassifierSVM, self).__init__()
self.nSamples = []
self.pSamples = []
self.pLabels = []
self.nLables = []
self.nLable = -1
self.pLabel = 1
def setPSamples(self, samples):
self.pSamples = samples
def setNSamples(self, samples):
self.nSamples = samples
def setLabel(self, pLabel, nLabel):
self.pLabel = pLabel
self.nLable = nLabel
def train(self, lambda_vlue, shuffle, **kwargs):
self.lambda_value = lambda_vlue
self.nLables = [self.nLable for i in range(len(self.nSamples))]
self.pLabels = [self.pLabel for i in range(len(self.pSamples))]
train_samples = self.pSamples + self.nSamples
train_labels = self.pLabels + self.nLables
all_data = []
for sample, label in zip(train_samples, train_labels):
all_data.append({'image': sample, 'label': label})
if shuffle:
random.shuffle(all_data)
train_samples = list(map(lambda x: x['image'], all_data))
train_labels = list(map(lambda x: x['label'], all_data))
super(ClassifierSVM, self).train(train_datas=train_samples, labels=train_labels, lambda_value=self.lambda_value)
訓練結果: 權值娜氏,偏置
完整工程
import numpy as np
import detector.svm.vlfeat_svm as svm
import random
class SVM:
def __init__(self):
self.name = 'svm'
self.weights = []
self.bias = 0.0
self.lambda_value = 0.0
self.optim_method = 'SGD'
def train(self, train_datas, labels, lambda_value):
self.lambda_value = lambda_value
self.weights, bias_ = svm.train_svm(train_datas, labels, lambda_value)
self.bias = bias_[0]
def eval(self, sample):
if len(sample) != len(self.weights):
raise ValueError
value = np.sum(np.array(self.weights) * np.array(sample))+self.bias
return value
class ClassifierSVM(SVM):
def __init__(self):
super(ClassifierSVM, self).__init__()
self.nSamples = []
self.pSamples = []
self.pLabels = []
self.nLables = []
self.nLable = -1
self.pLabel = 1
def setPSamples(self, samples):
self.pSamples = samples
def setNSamples(self, samples):
self.nSamples = samples
def setLabel(self, pLabel, nLabel):
self.pLabel = pLabel
self.nLable = nLabel
def train(self, lambda_vlue, shuffle, **kwargs):
self.lambda_value = lambda_vlue
self.nLables = [self.nLable for i in range(len(self.nSamples))]
self.pLabels = [self.pLabel for i in range(len(self.pSamples))]
train_samples = self.pSamples + self.nSamples
train_labels = self.pLabels + self.nLables
all_data = []
for sample, label in zip(train_samples, train_labels):
all_data.append({'image': sample, 'label': label})
random.shuffle(all_data)
train_samples = list(map(lambda x: x['image'], all_data))
train_labels = list(map(lambda x: x['label'], all_data))
super(ClassifierSVM, self).train(train_datas=train_samples, labels=train_labels, lambda_value=self.lambda_value)
if __name__ == '__main__':
print('*'*30)
pSamples = [[0.0, -0.5],
[0.6, -0.3],
[0.6, 0.0]]
nSamples = [[0.0, 0.5]]
classifier = ClassifierSVM()
classifier.setLabel(pLabel=1, nLabel=-1)
classifier.setPSamples(samples=pSamples)
classifier.setNSamples(samples=nSamples)
classifier.train(lambda_vlue=0.01, shuffle=False)
print(classifier.weights)
print(classifier.bias)