簡介
Softmax回歸模型是logistic回歸模型在多分類問題上的推廣扫责,在多分類問題中逃呼,類標(biāo)簽 y 可以取兩個以上的值抡笼。Softmax模型運用廣泛,很多復(fù)雜精細(xì)的訓(xùn)練模型最后一步都會用softmax來分配概率推姻。
公式
Softmax回歸最主要的代價函數(shù)如下:
其中,1{-} 是示性函數(shù)增炭,其取值規(guī)則為:1{值為真的表達(dá)式}= 1。
根據(jù)代價函數(shù)灾前,分別對k個分類進(jìn)行求導(dǎo)孟辑,就可以得到:
其中饲嗽,Xi為類別j的概率:
有了上面的偏導(dǎo)數(shù)公式以后,我們就可以用梯度下降算法更新theta值吞加,每一次迭代需要進(jìn)行如下更新:
以上公式并沒有涉及到bias優(yōu)化衔憨,加上bias的公式形如:
其實bias的迭代更新和theta一樣袄膏,只需要找到bias的偏導(dǎo)數(shù)就行沉馆,我的代碼中包含了對bias求導(dǎo)優(yōu)化。
數(shù)據(jù)
訓(xùn)練所用數(shù)據(jù)集是MNIST揖盘,MNIST數(shù)據(jù)集的官網(wǎng)是Yann LeCun's website锌奴。這個數(shù)據(jù)集包含60000行的訓(xùn)練數(shù)據(jù)集和10000行的測試數(shù)據(jù)集。
文件 | 內(nèi)容 |
---|---|
train-images-idx3-ubyte.gz | 訓(xùn)練集圖片 - 60000 張 訓(xùn)練圖片 |
train-labels-idx1-ubyte.gz | 訓(xùn)練集圖片對應(yīng)的數(shù)字標(biāo)簽 |
t10k-images-idx3-ubyte.gz | 測試集圖片 - 10000 張 圖片 |
t10k-labels-idx1-ubyte.gz | 測試集圖片對應(yīng)的數(shù)字標(biāo)簽 |
其中每一張圖片都是28*28的椭符,矩陣表示如下:
在數(shù)據(jù)集中销钝,每張圖片都是一個展開的向量琐簇,長度是 28x28 = 784座享。因此渣叛,在MNIST訓(xùn)練數(shù)據(jù)集中盯捌,train-images-idx3-ubyte解析后是一個形狀為 [60000, 784]的張量。
而train-labels-idx1-ubyte則是一個長度60000的向量箫攀,每一個值對應(yīng)train-images-idx3-ubyte中代表的數(shù)字范圍0~9靴跛。
測試數(shù)據(jù)與訓(xùn)練數(shù)據(jù)結(jié)構(gòu)一樣渡嚣,只是數(shù)量只有10000張。
下載過后讀取與解析到數(shù)據(jù)代碼:
//
// MLLoadMNIST.m
// MNIST
//
// Created by Jiao Liu on 9/23/16.
// Copyright ? 2016 ChangHong. All rights reserved.
//
#import "MLLoadMNIST.h"
@implementation MLLoadMNIST
int reverseInt(int input)
{
unsigned char ch1, ch2, ch3, ch4;
ch1=input&255;
ch2=(input>>8)&255;
ch3=(input>>16)&255;
ch4=(input>>24)&255;
return((int)ch1<<24)+((int)ch2<<16)+((int)ch3<<8)+ch4;
}
double **readImageData(const char *filePath)
{
FILE *file = fopen(filePath, "rb");
double **output = NULL;
if (file) {
int magic_number=0;
int number_of_images=0;
int n_rows=0;
int n_cols=0;
fread((char*)&magic_number, sizeof(magic_number), 1, file);
magic_number= reverseInt(magic_number);
fread((char*)&number_of_images, sizeof(number_of_images), 1, file);
number_of_images= reverseInt(number_of_images);
fread((char*)&n_rows, sizeof(n_rows), 1, file);
n_rows= reverseInt(n_rows);
fread((char*)&n_cols, sizeof(n_cols), 1, file);
n_cols= reverseInt(n_cols);
output = (double **)malloc(sizeof(double) * number_of_images);
for(int i=0;i<number_of_images;++i)
{
output[i] = (double *)malloc(sizeof(double) * n_rows * n_cols);
for(int r=0;r<n_rows;++r)
{
for(int c=0;c<n_cols;++c)
{
unsigned char temp=0;
fread((char*)&temp, sizeof(temp), 1, file);
output[i][(n_rows*r)+c]= (double)temp;
}
}
}
}
fclose(file);
return output;
}
int *readLabelData(const char *filePath)
{
FILE *file = fopen(filePath, "rb");
int *output = NULL;
if (file) {
int magic_number=0;
int number_of_items=0;
fread((char*)&magic_number, sizeof(magic_number), 1, file);
magic_number= reverseInt(magic_number);
fread((char*)&number_of_items, sizeof(number_of_items), 1, file);
number_of_items= reverseInt(number_of_items);
output = (int *)malloc(sizeof(int) * number_of_items);
for(int i=0;i<number_of_items;++i)
{
unsigned char temp=0;
fread((char*)&temp, sizeof(temp), 1, file);
output[i]= (int)temp;
}
}
fclose(file);
return output;
}
Softmax實現(xiàn)
- 首先我這里選用的梯度下降算法去逼近最值,所以需要一個迭代次數(shù)挤牛,代碼中默認(rèn)的是500次种蘸。輸入訓(xùn)練圖片是60000竞膳,如果每次迭代都全部用上坦辟,訓(xùn)練會花去很多時間,所以每次迭代默認(rèn)隨機(jī)取100張圖片進(jìn)行訓(xùn)練滨彻。
- 下降梯度的速率默認(rèn)是0.01挪蹭。
- 代碼中實現(xiàn)兩種梯度下降逼近,一種是每次迭代依次使用每張圖片去更新所有分類變量辜羊,另一種是每次迭代順序更新每個分類變量,更新每個分類時候使用所有隨機(jī)圖片數(shù)據(jù)碱妆。兩種方法其實效率一樣昔驱,但是測試時候發(fā)現(xiàn)第二種方法的正確率略高。
代碼如下:
//
// MLSoftMax.m
// MNIST
//
// Created by Jiao Liu on 9/26/16.
// Copyright ? 2016 ChangHong. All rights reserved.
//
#import "MLSoftMax.h"
@implementation MLSoftMax
- (id)initWithLoopNum:(int)loopNum dim:(int)dim type:(int)type size:(int)size descentRate:(double)rate
{
self = [super init];
if (self) {
_iterNum = loopNum == 0 ? 500 : loopNum;
_dim = dim;
_kType = type;
_randSize = size == 0 ? 100 : size;
_bias = malloc(sizeof(double) * type);
_theta = malloc(sizeof(double) * type * dim);
for (int i = 0; i < type; i++) {
_bias[i] = 0;
for (int j = 0; j < dim; j++) {
_theta[i * dim +j] = 0.0f;
}
}
_descentRate = rate == 0 ? 0.01 : rate;
}
return self;
}
- (void)dealloc
{
if (_bias != NULL) {
free(_bias);
_bias = NULL;
}
if (_theta != NULL) {
free(_theta);
_theta = NULL;
}
if (_randomX != NULL) {
free(_randomX);
_randomX = NULL;
}
if (_randomY != NULL) {
free(_randomY);
_randomY = NULL;
}
}
#pragma mark - SoftMax Main
- (void)randomPick:(int)maxSize
{
long rNum = random();
for (int i = 0; i < _randSize; i++) {
_randomX[i] = _image[(rNum+i) % maxSize];
_randomY[i] = _label[(rNum+i) % maxSize];
}
}
/*
- (double *)MaxPro:(double *)index
{
long double maxNum = index[0];
for (int i = 1; i < _kType; i++) {
maxNum = MAX(maxNum, index[i]);
}
long double sum = 0;
for (int i = 0; i < _kType; i++) {
index[i] -= maxNum;
index[i] = expl(index[i]);
sum += index[i];
}
for (int i = 0; i < _kType; i++) {
index[i] /= sum;
}
return index;
}
- (void)updateModel:(double *)index currentPos:(int)pos
{
double *delta = malloc(sizeof(double) * _kType);
for (int i = 0; i < _kType; i++) {
if (i != _randomY[pos]) {
delta[i] = 0.0 - index[i];
}
else
{
delta[i] = 1.0 - index[i];
}
_bias[i] -= _descentRate * delta[i];
for (int j = 0; j < _dim; j++) {
_theta[i * _dim +j] += _descentRate * delta[i] * _randomX[pos][j] / _randSize;
}
}
if (delta != NULL) {
free(delta);
delta = NULL;
}
}
- (void)train
{
_randomX = malloc(sizeof(double) * _randSize);
_randomY = malloc(sizeof(int) * _randSize);
double *index = malloc(sizeof(double) * _kType);
for (int i = 0; i < _iterNum; i++) {
[self randomPick:_trainNum];
for (int j = 0; j < _randSize; j++) {
// calculate wx+b
vDSP_mmulD(_theta, 1, _randomX[j], 1, index, 1, _kType, 1, _dim);
vDSP_vaddD(index, 1, _bias, 1, index, 1, _kType);
index = [self MaxPro:index];
[self updateModel:index currentPos:j];
}
}
if (index != NULL) {
free(index);
index = NULL;
}
}
*/
- (int)indicator:(int)label var:(int)x
{
if (label == x) {
return 1;
}
return 0;
}
- (double)sigmod:(int)type index:(int) index
{
double up = 0;
for (int i = 0; i < _dim; i++) {
up += _theta[type * _dim + i] * _randomX[index][i];
}
up += _bias[type];
double *down = malloc(sizeof(double) * _kType);
double maxNum = -0xfffffff;
vDSP_mmulD(_theta, 1, _randomX[index], 1, down, 1, _kType, 1, _dim);
vDSP_vaddD(down, 1, _bias, 1, down, 1, _kType);
for (int i = 0; i < _kType; i++) {
maxNum = MAX(maxNum, down[i]);
}
double sum = 0;
for (int i = 0; i < _kType; i++) {
down[i] -= maxNum;
sum += exp(down[i]);
}
if (down != NULL) {
free(down);
down = NULL;
}
return exp(up - maxNum) / sum;
}
- (double *)fderivative:(int)type
{
double *outP = malloc(sizeof(double) * _dim);
for (int i = 0; i < _dim; i++) {
outP[i] = 0;
}
double *inner = malloc(sizeof(double) * _dim);
for (int i = 0; i < _randSize; i++) {
long double sig = [self sigmod:type index:i];
int ind = [self indicator:_randomY[i] var:type];
double loss = -_descentRate * (ind - sig) / _randSize;
_bias[type] += loss * _randSize;
vDSP_vsmulD(_randomX[i], 1, &loss, inner, 1, _dim);
vDSP_vaddD(outP, 1, inner, 1, outP, 1, _dim);
}
if (inner != NULL) {
free(inner);
inner = NULL;
}
return outP;
}
- (void)train
{
_randomX = malloc(sizeof(double) * _randSize);
_randomY = malloc(sizeof(int) * _randSize);
for (int i = 0; i < _iterNum; i++) {
[self randomPick:_trainNum];
for (int j = 0; j < _kType; j++) {
double *newTheta = [self fderivative:j];
for (int m = 0; m < _dim; m++) {
_theta[j * _dim + m] = _theta[j * _dim + m] - _descentRate * newTheta[m];
}
if (newTheta != NULL) {
free(newTheta);
newTheta = NULL;
}
}
}
}
- (void)saveTrainDataToDisk
{
NSFileManager *fileManager = [NSFileManager defaultManager];
NSString *thetaPath = [[NSSearchPathForDirectoriesInDomains(NSCachesDirectory, NSUserDomainMask, YES) objectAtIndex:0] stringByAppendingString:@"/Theta.txt"];
// NSLog(@"%@",thetaPath);
NSData *data = [NSData dataWithBytes:_theta length:sizeof(double) * _dim * _kType];
[fileManager createFileAtPath:thetaPath contents:data attributes:nil];
NSString *biasPath = [[NSSearchPathForDirectoriesInDomains(NSCachesDirectory, NSUserDomainMask, YES) objectAtIndex:0] stringByAppendingString:@"/bias.txt"];
data = [NSData dataWithBytes:_bias length:sizeof(double) * _kType];
[fileManager createFileAtPath:biasPath contents:data attributes:nil];
}
- (int)predict:(double *)image
{
double maxNum = -0xffffff;
int label = -1;
double *index = malloc(sizeof(double) * _kType);
vDSP_mmulD(_theta, 1, image, 1, index, 1, _kType, 1, _dim);
vDSP_vaddD(index, 1, _bias, 1, index, 1, _kType);
for (int i = 0; i < _kType; i++) {
if (index[i] > maxNum) {
maxNum = index[i];
label = i;
}
}
return label;
}
- (int)predict:(double *)image withOldTheta:(double *)theta andBias:(double *)bias
{
double maxNum = -0xffffff;
int label = -1;
double *index = malloc(sizeof(double) * _kType);
vDSP_mmulD(theta, 1, image, 1, index, 1, _kType, 1, _dim);
vDSP_vaddD(index, 1, bias, 1, index, 1, _kType);
for (int i = 0; i < _kType; i++) {
if (index[i] > maxNum) {
maxNum = index[i];
label = i;
}
}
return label;
}
@end
最后訓(xùn)練結(jié)果:
不斷改變循環(huán)次數(shù),下降速率等參數(shù)朴艰,都會帶來正確率的變化混移。測試結(jié)果發(fā)現(xiàn)默認(rèn)參數(shù)能帶來接近最好的識別的正確率,90%左右??毁嗦。
結(jié)語
90%的正確率并非達(dá)到最優(yōu)回铛,因為這僅僅是個簡單的模型,可以加上卷積神經(jīng)網(wǎng)絡(luò)來改善效果腔长。
關(guān)于卷積神經(jīng)網(wǎng)絡(luò)實現(xiàn)验残,我也實現(xiàn)了一版,但由于效率與正確率不高鸟召,后面優(yōu)化后再分享??氨鹏。