https://www.zhihu.com/question/41252833
http://ufldl.stanford.edu/wiki/index.php/Softmax%E5%9B%9E%E5%BD%92
其中p為真實(shí)分布,q非真實(shí)分布扎唾。(q越接近p 交叉熵的值越低)
在softmax分類算法中,使用交叉熵來進(jìn)行樣本預(yù)測(交叉熵越低械媒,樣本預(yù)測越準(zhǔn)確),以手寫數(shù)字識別為例:
手寫數(shù)字1,那么真實(shí)分布就是(0,1,0,0,0, 0,0,0,0,0)
手寫數(shù)字2纷捞,那么真實(shí)分布就是(0,0,1,0,0, 0,0,0,0,0)
…
所以手寫數(shù)字1痢虹,對應(yīng)的交叉熵就是:H = 1log1/p(1)+0log1/p(2) +…+0log1/p(9) = 1log1/p(1) = -log p(1)
softmax中的交叉熵:
求導(dǎo)獲取梯度更新的公式(更深入的話可研究一下矩陣求導(dǎo)):
W更新公式:
用矩陣表示,則為(假設(shè)x(i)為[1~1024]): - { x[11024] * ([0,1,0,0,0, 0,0,0,0,0] - [p(0),p(1),…,p(10)] ) } * 1/m
如要了解具體的參數(shù)變化主儡,需針對梯度下降法奖唯,進(jìn)行研究
全部參數(shù)初始化為0 ,W[102410]
import numpy as np
x1=np.reshape([1,0],(1,2))
x2=np.reshape([0,1],(1,2))
x3=np.reshape([1,1],(1,2))
x4=np.reshape([0,0],(1,2))
w=np.random.randn(2, 4)*0.001
y1 = np.reshape([1,0,0,0],(1,4))
y2 = np.reshape([0,1,0,0],(1,4))
y3 = np.reshape([0,0,1,0],(1,4))
y4 = np.reshape([0,0,0,1],(1,4))
for i in range(5000):
ys1 = np.dot(x1, w)
ps1 = np.exp(ys1) / np.sum(np.exp(ys1))
print(ps1)
ys2 = np.dot(x2, w)
ps2 = np.exp(ys2) / np.sum(np.exp(ys2))
print(ps2)
ys3 = np.dot(x3, w)
ps3 = np.exp(ys3) / np.sum(np.exp(ys3))
print(ps3)
ys4 = np.dot(x4, w)
ps4 = np.exp(ys4) / np.sum(np.exp(ys4))
print(ps4)
w += 0.02 * np.dot(x1.T,((y1 - ps1)))
w += 0.02 * np.dot(x2.T,((y2 - ps2)))
w += 0.02 * np.dot(x3.T, ((y3 - ps3)))
w += 0.02 * np.dot(x4.T, ((y3 - ps3)))
總結(jié):
y(i)
x(i) * W = ys(i) ----softmax----->ps(i)
W += 0.02 * (x(i).T *(y(i)- ps(i)))
y(i)為標(biāo)簽糜值,x(i)為樣本