SVM
回顧一下之前的SVM丽惭,找到一個間隔最大的函數(shù),使得正負(fù)樣本離該函數(shù)是最遠(yuǎn)的,是否最遠(yuǎn)不是看哪個點離函數(shù)最遠(yuǎn)沮协,而是找到一個離函數(shù)最近的點看他是不是和該分割函數(shù)離的最近的辆飘。
使用large margin來regularization啦辐。
之前講SVM的算法:http://www.reibang.com/p/8fd28df734a0
線性分類
線性SVM就是一種線性分類的方法。輸入蜈项,輸出芹关,每一個樣本的權(quán)重是,偏置項bias是紧卒。得分函數(shù)
算出這么多個類別充边,哪一個類別的分?jǐn)?shù)高,那就是哪個類別。比如要做的圖像識別有三個類別浇冰,假設(shè)這張圖片有4個像素贬媒,拉伸成單列:
一般來說習(xí)慣會把w和b合并了漂佩,x加上一個全為1的列脖含,于是有
損失函數(shù)
之前的SVM是把正負(fù)樣本離分割函數(shù)有足夠的空間,雖然正確的是貓投蝉,但是貓的得分是最低的养葵,常規(guī)方法是將貓的分?jǐn)?shù)提高,這樣才可以提高貓的正確率瘩缆。但是SVM里面是要求一個間隔最大化关拒,提到這里來說,其實就是cat score不僅僅是要大于其他的分?jǐn)?shù)庸娱,而且是要有一個最低閾值着绊,cat score不能低于這個分?jǐn)?shù)。
所以正確的分類score應(yīng)該是要大于其他的分類score一個閾值:
就是正確分類的分?jǐn)?shù)熟尉,就是其他分類的分?jǐn)?shù)归露。所以,這個損失函數(shù)就是:只有正確的分?jǐn)?shù)比其他的都大于一個閾值才為0斤儿,否則都是有損失的剧包。
只有
這種squared hinge loss SVM與linear hinge loss SVM相比較,特點是對違背間隔閾值要求的點加重懲罰局骤,違背的越大攀圈,懲罰越大。某些實際應(yīng)用中峦甩,squared hinge loss SVM的效果更好一些赘来。具體使用哪個现喳,可以根據(jù)實際問題,進(jìn)行交叉驗證再確定犬辰。
對于
最后還要增加的就是過擬合,regularization的限制了轿偎。L2正則化:
加上正則化之后就是:
N是訓(xùn)練樣本的個數(shù)典鸡,取平均損失函數(shù),
代碼實現(xiàn)
首先是對CIFAR10的數(shù)據(jù)讀取:
def load_pickle(f):
version = platform.python_version_tuple()
if version[0] == '2':
return pickle.load(f)
elif version[0] == '3':
return pickle.load(f, encoding='latin1')
raise ValueError("invalid python version: {}".format(version))
def loadCIFAR_batch(filename):
with open(filename, 'rb') as f:
datadict = load_pickle(f)
x = datadict['data']
y = datadict['labels']
x = x.reshape(10000, 3, 32, 32).transpose(0, 3, 2, 1).astype('float')
y = np.array(y)
return x, y
def loadCIFAR10(root):
xs = []
ys = []
for b in range(1, 6):
f = os.path.join(root, 'data_batch_%d' % (b, ))
x, y = loadCIFAR_batch(f)
xs.append(x)
ys.append(y)
X = np.concatenate(xs)
Y = np.concatenate(ys)
x_test, y_test = loadCIFAR_batch(os.path.join(root, 'test_batch'))
return X, Y, x_test, y_test
首先要讀入每一個文件的數(shù)據(jù)搂蜓,先用load_pickle把文件讀成字典形式狼荞,取出來。因為常規(guī)的圖片都是(數(shù)量帮碰,高相味,寬,RGB顏色)殉挽,在loadCIFAR_batch要用transpose來把維度調(diào)換一下丰涉。最后把每一個文件的數(shù)據(jù)都集合起來。
之后就是數(shù)據(jù)的格式調(diào)整了:
def data_validation(x_train, y_train, x_test, y_test):
num_training = 49000
num_validation = 1000
num_test = 1000
num_dev = 500
mean_image = np.mean(x_train, axis=0)
x_train -= mean_image
mask = range(num_training, num_training + num_validation)
X_val = x_train[mask]
Y_val = y_train[mask]
mask = range(num_training)
X_train = x_train[mask]
Y_train = y_train[mask]
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = x_train[mask]
Y_dev = y_train[mask]
mask = range(num_test)
X_test = x_test[mask]
Y_test = y_test[mask]
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])
return X_val, Y_val, X_train, Y_train, X_dev, Y_dev, X_test, Y_test
pass
數(shù)據(jù)要變成一個長條斯碌。
先看看數(shù)據(jù)長啥樣:
def showPicture(x_train, y_train):
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
samples_per_classes = 7
for y, cls in enumerate(classes):
idxs = np.flatnonzero(y_train == y)
idxs = np.random.choice(idxs, samples_per_classes, replace=False)
for i, idx in enumerate(idxs):
plt_index = i*num_classes +y + 1
plt.subplot(samples_per_classes, num_classes, plt_index)
plt.imshow(x_train[idx].astype('uint8'))
plt.axis('off')
if i == 0:
plt.title(cls)
plt.show()
然后就是使用谷歌的公式了:
def loss(self, x, y, reg):
loss = 0.0
dw = np.zeros(self.W.shape)
num_train = x.shape[0]
scores = x.dot(self.W)
correct_class_score = scores[range(num_train), list(y)].reshape(-1, 1)
margin = np.maximum(0, scores - correct_class_score + 1)
margin[range(num_train), list(y)] = 0
loss = np.sum(margin)/num_train + 0.5 * reg * np.sum(self.W*self.W)
num_classes = self.W.shape[1]
inter_mat = np.zeros((num_train, num_classes))
inter_mat[margin > 0] = 1
inter_mat[range(num_train), list(y)] = 0
inter_mat[range(num_train), list(y)] = -np.sum(inter_mat, axis=1)
dW = (x.T).dot(inter_mat)
dW = dW/num_train + reg*self.W
return loss, dW
pass
操作都是常規(guī)操作一死,算出score然后求loss最后SGD求梯度更新W。
def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100,batch_size=200, verbose=False):
num_train, dim = X.shape
num_classes = np.max(y) + 1
if self.W is None:
self.W = 0.001 * np.random.randn(dim, num_classes)
# Run stochastic gradient descent to optimize W
loss_history = []
for it in range(num_iters):
X_batch = None
y_batch = None
idx_batch = np.random.choice(num_train, batch_size, replace = True)
X_batch = X[idx_batch]
y_batch = y[idx_batch]
# evaluate loss and gradient
loss, grad = self.loss(X_batch, y_batch, reg)
loss_history.append(loss)
self.W -= learning_rate * grad
if verbose and it % 100 == 0:
print('iteration %d / %d: loss %f' % (it, num_iters, loss))
return loss_history
pass
預(yù)測:
def predict(self, X):
y_pred = np.zeros(X.shape[0])
scores = X.dot(self.W)
y_pred = np.argmax(scores, axis = 1)
return y_pred
最后運行函數(shù):
svm = LinearSVM()
tic = time.time()
cifar10_name = '../Data/cifar-10-batches-py'
x_train, y_train, x_test, y_test = loadCIFAR10(cifar10_name)
X_val, Y_val, X_train, Y_train, X_dev, Y_dev, X_test, Y_test = data_validation(x_train, y_train, x_test, y_test)
loss_hist = svm.train(X_train, Y_train, learning_rate=1e-7, reg=2.5e4,
num_iters=3000, verbose=True)
toc = time.time()
print('That took %fs' % (toc - tic))
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
plt.show()
y_test_pred = svm.predict(X_test)
test_accuracy = np.mean(Y_test == y_test_pred)
print('accuracy: %f' % test_accuracy)
w = svm.W[:-1, :] # strip out the bias
w = w.reshape(32, 32, 3, 10)
w_min, w_max = np.min(w), np.max(w)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in range(10):
plt.subplot(2, 5, i + 1)
wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
plt.imshow(wimg.astype('uint8'))
plt.axis('off')
plt.title(classes[i])
plt.show()
首先是畫出整個loss函數(shù)趨勢:
最后再可視化一下w權(quán)值傻唾,看看每一個種類提取處理的特征是什么樣子的: