前言
在之前的學(xué)習(xí)中,已經(jīng)了解了神經(jīng)網(wǎng)絡(luò)及其神經(jīng)網(wǎng)絡(luò)的反向傳播算法的具體算法分析糖荒,現(xiàn)在阱州,我們可以用使用該算法實(shí)現(xiàn)手寫數(shù)字的識(shí)別鼓寺,即也就是用python實(shí)現(xiàn)一個(gè)多分類問題。
神經(jīng)網(wǎng)絡(luò)的實(shí)現(xiàn)
首先勋磕,根據(jù)神經(jīng)網(wǎng)絡(luò)的相關(guān)算法原理妈候,用python實(shí)現(xiàn)神經(jīng)網(wǎng)絡(luò)的前向傳播算法
- 導(dǎo)入相關(guān)庫(kù)
與之前的代碼一樣,我們需要導(dǎo)入相關(guān)庫(kù)以便實(shí)現(xiàn)一些神經(jīng)網(wǎng)絡(luò)的運(yùn)算挂滓,導(dǎo)入以下相關(guān)庫(kù)苦银。
import matplotlib.pyplot as plt
import numpy as np
import scipy.io as scio
import scipy.optimize as opt
- 初始化一些參數(shù)
這些參數(shù)的介紹如下注釋所示:
input_layer_size = 400 # 20x20 輸入像素矩陣
hidden_layer_size = 25 # 25個(gè)隱藏層
num_labels = 10 # 0到9的十個(gè)輸出
- 數(shù)據(jù)可視化
參考python手寫數(shù)字識(shí)別的訓(xùn)練集介紹,導(dǎo)入訓(xùn)練集杂彭,并隨機(jī)選擇100個(gè)訓(xùn)練集實(shí)現(xiàn)數(shù)據(jù)可視化墓毒,具體代碼如下所示(參考注釋):
def display_data(x):
(m, n) = x.shape
# m = 100
# n = 400
# 設(shè)置每個(gè)數(shù)字的寬度與高度(像素)
example_width = np.round(np.sqrt(n)).astype(int)# example_width=20
example_height = (n / example_width).astype(int) #example_height=20
# 計(jì)算顯示的行數(shù)與列數(shù)
display_rows = np.floor(np.sqrt(m)).astype(int) #display_rows=10
display_cols = np.ceil(m / display_rows).astype(int)#display_rows=10
# 單個(gè)圖像之間的間隔
pad = 1
# 設(shè)置并初始化顯示像素矩陣的大小211*211 ,1+(10*20+1)
display_array = - np.ones((pad + display_rows * (example_height + pad),
pad + display_rows * (example_height + pad)))
# 將每個(gè)訓(xùn)練樣本顯示在矩陣中
curr_ex = 0
for j in range(display_rows):
for i in range(display_cols):
if curr_ex > m:
break
# 每次每行和每列讀取一個(gè)20*20像素的數(shù)字,矩陣大小加21
# 實(shí)際上矩陣形式可以認(rèn)為 10*10*400(20*20像素)
# 填充要顯示的像素矩陣
max_val = np.max(np.abs(x[curr_ex]))
display_array[pad + j * (example_height + pad) + np.arange(example_height),
pad + i * (example_width + pad) + np.arange(example_width)[:, np.newaxis]] = \
x[curr_ex].reshape((example_height, example_width)) / max_val
curr_ex += 1
if curr_ex > m:
break
# Display image
plt.figure()
plt.imshow(display_array, cmap='gray', extent=[-1, 1, -1, 1])
plt.axis('off')
rand_indices = np.random.permutation(range(m))
selected = X[rand_indices[0:100], :] #隨機(jī)選擇100個(gè)數(shù)據(jù)
display_data(selected)
最后亲怠,隨機(jī)選擇100個(gè)輸入樣本,調(diào)用以上函數(shù)柠辞,實(shí)現(xiàn)數(shù)據(jù)可視化团秽,如下所示:
-
神經(jīng)網(wǎng)絡(luò)模型選擇
我們選擇的神經(jīng)網(wǎng)絡(luò)如下圖所示,由三層構(gòu)成叭首,包括一個(gè)輸入層习勤,隱藏層和輸出層,輸入層是一個(gè)數(shù)字圖片的像素矩陣焙格,大小為20X20像素图毕,所以輸入層有400個(gè)輸入單元(不計(jì)算偏置單元),訓(xùn)練集中已經(jīng)給我們提供了參數(shù)矩陣眷唉,我們選擇的神經(jīng)網(wǎng)絡(luò)是一個(gè)三層神經(jīng)網(wǎng)絡(luò)予颤,顯然,需要兩個(gè)參數(shù)矩陣冬阳,第一個(gè)參數(shù)矩陣是25X401維的矩陣蛤虐,而第二個(gè)參數(shù)矩陣是26X10維的矩陣。(與輸出層10個(gè)數(shù)字類別相對(duì)應(yīng))肝陪。
-
反向傳播算法的損失函數(shù)
神經(jīng)網(wǎng)絡(luò)正則化的損失函數(shù)如下所示:
很明顯驳庭,,所以的輸出如下所示:
假設(shè)通過神經(jīng)網(wǎng)絡(luò)輸出的值是5氯窍,則,即向量的第5個(gè)元素是1饲常,其他元素是0.
綜上所述,參考反向傳播算法損失函數(shù)的相關(guān)介紹,損失函數(shù)與梯度的計(jì)算代碼如下:
def nn_cost_function(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lmd):
theta1 = nn_params[:hidden_layer_size * (input_layer_size + 1)].reshape(hidden_layer_size, input_layer_size + 1)
theta2 = nn_params[hidden_layer_size * (input_layer_size + 1):].reshape(num_labels, hidden_layer_size + 1)
m = y.size
cost = 0
theta1_grad = np.zeros(theta1.shape) # 25 x 401
theta2_grad = np.zeros(theta2.shape) # 10 x 26
Y = np.zeros((m, num_labels)) # 5000 x 10
for i in range(m):
Y[i, y[i]-1] = 1
a1 = np.c_[np.ones(m), X] # 5000 x 401
a2 = np.c_[np.ones(m), sigmoid(np.dot(a1, theta1.T))] # 5000 x 26
hypothesis = sigmoid(np.dot(a2, theta2.T)) # 5000 x 10
reg_theta1 = theta1[:, 1:] # 25 x 400
reg_theta2 = theta2[:, 1:] # 10 x 25
cost = np.sum(-Y * np.log(hypothesis) - np.subtract(1, Y) * np.log(np.subtract(1, hypothesis))) / m \
+ (lmd / (2 * m)) * (np.sum(reg_theta1 * reg_theta1) + np.sum(reg_theta2 * reg_theta2))
e3 = hypothesis - Y # 5000 x 10
e2 = np.dot(e3, theta2) * (a2 * np.subtract(1, a2)) # 5000 x 26
e2 = e2[:, 1:] # drop the intercept column # 5000 x 25
delta1 = np.dot(e2.T, a1) # 25 x 401
delta2 = np.dot(e3.T, a2) # 10 x 26
p1 = (lmd / m) * np.c_[np.zeros(hidden_layer_size), reg_theta1]
p2 = (lmd / m) * np.c_[np.zeros(num_labels), reg_theta2]
theta1_grad = p1 + (delta1 / m)
theta2_grad = p2 + (delta2 / m)
grad = np.concatenate([theta1_grad.flatten(), theta2_grad.flatten()])
return cost, grad
初始化一些參數(shù)狼讨,并且調(diào)用上述函數(shù)贝淤,計(jì)算結(jié)果如下所示:
data = scio.loadmat('ex4weights.mat')
theta1 = data['Theta1']
theta2 = data['Theta2']
lmd = 1
nn_params = np.concatenate([theta1.flatten(), theta2.flatten()])
cost, grad = nn_cost_function(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lmd)
print(cost)
- sigmoid函數(shù)求導(dǎo)
激活函數(shù)的求導(dǎo)可以用以下公式表示:
根據(jù)以上公式,用代碼實(shí)現(xiàn)方式如下所示:
def sigmoid_gradient(z):
g = np.zeros(z.shape)
g = sigmoid(z)*(1-sigmoid(z))
return g
隨機(jī)輸入一組參數(shù)熊楼,計(jì)算結(jié)果如下所示:
- 隨機(jī)初始化
正如之前的介紹一樣霹娄,神經(jīng)網(wǎng)絡(luò)中參數(shù)初始化與邏輯回歸中不同能犯,不能初始化為0,應(yīng)該是隨機(jī)初始化犬耻。如下代碼所示:
def rand_initialization(l_in, l_out):
w = np.zeros((l_out, 1 + l_in))
ep_init = 0.12
w = np.random.rand(l_out, 1 + l_in) * (2 * ep_init) - ep_init
return w
- 梯度檢測(cè)
為了有效減小神經(jīng)網(wǎng)絡(luò)反向傳播算法中的造成的誤差踩晶,為此比較數(shù)值化的計(jì)算方法與梯度下降算法是否接近,數(shù)值化的梯度計(jì)算方法如下所示:
def compute_numerial_gradient(cost_func, theta):
numgrad = np.zeros(theta.size)
perturb = np.zeros(theta.size)
e = 1e-4
for p in range(theta.size):
perturb[p] = e
loss1, grad1 = cost_func(theta - perturb)
loss2, grad2 = cost_func(theta + perturb)
numgrad[p] = (loss2 - loss1) / (2 * e)
perturb[p] = 0
return numgrad
在經(jīng)過梯度檢測(cè)之后枕磁,我們要對(duì)權(quán)重參數(shù)重新進(jìn)行校正渡蜻,詳細(xì)代碼如下所示:
def debug_initialize_weights(fan_out, fan_in):
w = np.zeros((fan_out, 1 + fan_in))
w = np.sin(np.arange(w.size)).reshape(w.shape) / 10
return w
有了以上代碼,梯度檢測(cè)算法的實(shí)現(xiàn)如下所示:
def check_nn_gradients(lmd):
input_layer_size = 3
hidden_layer_size = 5
num_labels = 3
m = 5
# 產(chǎn)生一些隨機(jī)參數(shù)
theta1 = debug_initialize_weights(hidden_layer_size, input_layer_size)
theta2 = debug_initialize_weights(num_labels, hidden_layer_size)
X = debug_initialize_weights(m, input_layer_size - 1)
y = 1 + np.mod(np.arange(1, m + 1), num_labels)
# 參數(shù)展開
nn_params = np.concatenate([theta1.flatten(), theta2.flatten()])
def cost_func(p):
return nn_cost_function(p, input_layer_size, hidden_layer_size, num_labels, X, y, lmd)
cost, grad = cost_func(nn_params)
numgrad = compute_numerial_gradient(cost_func, nn_params)
print(np.c_[grad, numgrad])
以上代碼中cost_func()
函數(shù)和grad_func()
函數(shù)的實(shí)現(xiàn)過程如下所示:
def cost_func(p):
return nn_cost_function(p, input_layer_size, hidden_layer_size, num_labels, X, y, lmd)[0]
def grad_func(p):
return nn_cost_function(p, input_layer_size, hidden_layer_size, num_labels, X, y, lmd)[1]
通過以上代碼的運(yùn)行计济,得到的反向傳播得到的梯度參數(shù)和數(shù)值化計(jì)算得到的梯度參數(shù)如下所示,通過分析茸苇,可以看出來,兩者大小基本一致沦寂。
- 查看隱藏層
通過調(diào)用display_data()
函數(shù)学密,傳入隱藏層權(quán)重參數(shù),隱藏層的實(shí)現(xiàn)效果如下所示:
數(shù)據(jù)預(yù)測(cè)
通過輸入?yún)?shù)和已經(jīng)通過神經(jīng)網(wǎng)絡(luò)訓(xùn)練之后得到的權(quán)重參數(shù)传藏,就可以實(shí)現(xiàn)手寫數(shù)字的識(shí)別了腻暮,具體代碼如下,f返回值為訓(xùn)練精確度。
def predict(theta1, theta2, x):
m = x.shape[0]
x = np.c_[np.ones(m), x]
h1 = sigmoid(np.dot(x, theta1.T))
h1 = np.c_[np.ones(h1.shape[0]), h1]
h2 = sigmoid(np.dot(h1, theta2.T))
p = np.argmax(h2, axis=1) + 1
return p
最后毯侦,運(yùn)行以上代碼哭靖,得到的訓(xùn)練精確度如下所示: