完整代碼:>>點(diǎn)我 歡迎star,fork,一起學(xué)習(xí)
網(wǎng)絡(luò)用途
或者說(shuō)應(yīng)用場(chǎng)景:使用單層神經(jīng)網(wǎng)絡(luò)來(lái)識(shí)別一張圖片是否是貓咪的圖片昆雀。
數(shù)學(xué)表示
給定一張圖片$X$ 送到網(wǎng)絡(luò)中,判斷這張圖片是否是貓咪的照片加矛?
網(wǎng)絡(luò)架構(gòu)
多層神經(jīng)網(wǎng)絡(luò)處理過(guò)程:
- X --> $[linear + relu]^{(L-1)}$ --->[linear + sigmoid] ---> $\hat{y}$
數(shù)學(xué)表示
訓(xùn)練集: $X = [x{(1)},x{(2)},...,x{(i)},....,x{(m)}]$ ;對(duì)應(yīng)標(biāo)簽:$Y=[y{(1)},y{(2)},...,y{(i)},...,y{(m)}]$ ;
對(duì)于訓(xùn)練集中的每張照片$x^{(i)}$ 的處理過(guò)程:
repeat:
? $z^{(i)} = wTx{(i)}+b$
? $\hat{y}^{(i)} = a^{(i)} = g(z^{(i)})$
$L(a{(i)},y{(i)}) = -y{(i)}log(a{(i)})-(1-y{(i)})log(1-a{(i)})$
成本函數(shù):
$J = \frac{1}{m} \sum_{i=1}^{m} L(a{(i)},y{(i)})$
最后通過(guò)反向傳播算法荚板,計(jì)算參數(shù)$W$ 和 $b$ 欠啤。
模型定義
模型定義步驟
- 定義模型結(jié)構(gòu)(如輸入向量的特征數(shù)目)
- 初始化模型參數(shù)哗咆;
- 循環(huán):
- 前向傳播,計(jì)算loss猾担;
- 反向傳播袭灯,計(jì)算梯度;
- 梯度下降绑嘹,更新參數(shù)稽荧;
代碼實(shí)現(xiàn)
激活函數(shù)
- sigmoid 激活函數(shù)及其反向傳播過(guò)程
def sigmoid(Z):
"""
sigmoid激活函數(shù);
:param Z:
:return:
- A: 激活函數(shù)值sigmoid(z),
- cache: (存儲(chǔ)Z值,方便反向傳播時(shí)直接使用)
"""
A = 1.0/(1+np.exp(-Z))
cache = Z
return A, cache
def sigmoid_backward(dA,cache):
"""
激活函數(shù)的反向傳播
:param dA: loss對(duì)A的導(dǎo)數(shù)
:param cache:前向傳播中緩存的sigmoid輸入Z工腋;
:return:dZ
"""
Z = cache
s = 1.0/(1 + np.exp(-Z))
dZ = dA * s * (1-s)
return dZ
- relu激活函數(shù)及其反向傳播過(guò)程
def relu(Z):
"""
relu激活函數(shù)姨丈;
:param Z:
:return:
- A:
- cache:
"""
A = np.maximum(0,Z)# max適合單個(gè)數(shù)值間的比較
cache = Z
return A, cache
def relu_backward(dA,cache):
"""
relu 反向傳播計(jì)算方法;relu = np.maximum(0,A)擅腰;導(dǎo)數(shù)值:1 or 0.----> dZ= dA or 0
:param dA:
:param cache:
:return: dZ
"""
Z = cache
dZ = np.array(dA, copy=True)
#當(dāng)Z<=0時(shí)蟋恬,dZ=0
dZ[Z <= 0] = 0
assert(dZ.shape == Z.shape) #確保維度相同
return dZ
參數(shù)初始化
權(quán)重系數(shù)$W$和$b$ 全都初始化為0.
def initialize_parameters_deep(layer_dims,type='he'):
"""
深度神經(jīng)網(wǎng)絡(luò)系數(shù)初始化函數(shù)
:param layer_dims: 神經(jīng)網(wǎng)絡(luò)各層神經(jīng)元列表, eg:[12288,100,10,1]
:param type: 系數(shù)初始化方法:zeros,random,he;
:return: parameters:系數(shù)字典
"""
np.random.seed(10)
parameters = {}
L = len(layer_dims)
if type == "zeros":
for i in range(1, L):
parameters['W'+str(i)] = np.zeros((layer_dims[i], layer_dims[i-1]))
parameters['b'+str(i)] = np.zeros((layer_dims[i], 1))
assert (parameters['W' + str(i)].shape == (layer_dims[i], layer_dims[i - 1]))
assert (parameters['b' + str(i)].shape == (layer_dims[i], 1))
elif type == "random":
for i in range(1, L):
parameters['W'+str(i)] = np.random.randn(layer_dims[i],layer_dims[i-1]) * 0.01
parameters['b'+str(i)] = np.zeros((layer_dims[i], 1))
assert (parameters['W' + str(i)].shape == (layer_dims[i], layer_dims[i - 1]))
assert (parameters['b' + str(i)].shape == (layer_dims[i], 1))
elif type == "he":
for i in range(1, L):
parameters['W'+str(i)] = np.random.randn(layer_dims[i], layer_dims[i-1]) / np.sqrt(layer_dims[i-1])
parameters['b'+str(i)] = np.zeros((layer_dims[i], 1))
assert (parameters['W' + str(i)].shape == (layer_dims[i], layer_dims[i - 1]))
assert (parameters['b' + str(i)].shape == (layer_dims[i], 1))
return parameters
前向傳播
前向傳播過(guò)程
訓(xùn)練集: $$X = [x{(1)},x{(2)},...,x{(i)},....,x{(m)}]$$ ;對(duì)應(yīng)標(biāo)簽:$$Y=[y{(1)},y{(2)},...,y{(i)},...,y{(m)}] $$;
對(duì)于訓(xùn)練集中的每張照片$x^{(i)}$ 的處理過(guò)程:
$z^{(i)} = wTx{(i)}+b$
$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})$
$L(a{(i)},y{(i)}) = -y{(i)}log(a{(i)})-(1-y{(i)})log(1-a{(i)})$
成本函數(shù):$J = \frac{1}{m} \sum_{i=1}^{m} L(a{(i)},y{(i)})$
代碼實(shí)現(xiàn)
- 線性部分的前向傳播過(guò)程
def linear_forward(A_pre,W,b):
"""
前向傳播-線性部分
:param A_pre:前一層的輸出值-激活值
:param W:系數(shù)矩陣
:param b:偏置矩陣
:return:線性部分Z,cache(A,W,b)
"""
Z = np.dot(W, A_pre) + b
assert(Z.shape == (W.shape[0], A_pre.shape[1])) #可能有多個(gè)樣本A_pre.shape[1]樣本容量
cache = (A_pre, W, b)
return Z, cache
- 單層網(wǎng)絡(luò)的前向傳播過(guò)程[線性部分 + 激活函數(shù)]
def linear_activation_forward(A_pre, W, b, activation):
"""
單層網(wǎng)絡(luò)(構(gòu)成:線性部分+激活函數(shù)) 的輸出結(jié)果;
:param A_pre: 上一層的輸出激活值趁冈;
:param W: 本層網(wǎng)絡(luò)的系數(shù)矩陣
:param b: 偏置
:param activation: 本層網(wǎng)絡(luò)的激活函數(shù)類(lèi)型:sigmoid, relu歼争;
:return:
- A:激活函數(shù)值;
- cache:linear_cache, activation_cache渗勘;加快反向傳播計(jì)算速度矾飞;
"""
if activation == 'sigmoid':
Z, linear_cache = linear_forward(A_pre, W, b)
A, activation_cache = sigmoid(Z)
elif activation == 'relu':
Z, linear_cache = linear_forward(A_pre, W, b)
A, activation_cache = relu(Z)
assert(A.shape == (W.shape[0], A_pre.shape[1]))
cache = (linear_cache, activation_cache)
return A, cache
- 神經(jīng)網(wǎng)絡(luò)的前向傳播過(guò)程
def L_model_forward(X, parameters):
"""
L層深度神經(jīng)網(wǎng)絡(luò)的前向傳播過(guò)程;
網(wǎng)絡(luò)架構(gòu):X-->(linear-relu)[L-1]-->(linear-sigmoid)-->AL呀邢;
:param X: 輸入
:param parameters: 各層網(wǎng)絡(luò)系數(shù)字典;
:return:
- AL:最終的輸出值豹绪;水平排列
- caches:各層網(wǎng)絡(luò)的cache列表价淌;
"""
caches = []
A = X
L = len(parameters) // 2 #確保是一個(gè)整數(shù)值;
#前(L-1)層都是相同的架構(gòu)申眼,可以用for循環(huán)計(jì)算;最后一層單獨(dú)計(jì)算;
for i in range(1, L):
A_pre = A
A, cache = linear_activation_forward(A_pre,parameters['W'+str(i)],parameters['b'+str(i)],\
activation='relu')
caches.append(cache)
AL, cache = linear_activation_forward(A, parameters['W'+str(L)], parameters['b'+str(L)],\
activation='sigmoid')
caches.append(cache)
assert (AL.shape == (1, X.shape[1]))
return AL, caches
由于網(wǎng)絡(luò)為單層神經(jīng)網(wǎng)絡(luò)蝉衣,前向傳播過(guò)程和反向傳播過(guò)程比較簡(jiǎn)單括尸,所以整合到一起。直接計(jì)算出相應(yīng)的成本函數(shù)和相應(yīng)的系數(shù)梯度病毡。
反向傳播
反向傳播過(guò)程
編碼實(shí)現(xiàn)
- 線性部分的反向傳播
def linear_backward(dZ, cache):
"""
反向傳播的線性部分
:param dZ:
:param cache: 前向傳播中的緩存值(A_pre, W, b)
:return:
- dA_pre:關(guān)于前一層A的導(dǎo)數(shù)值濒翻;
- dW:關(guān)于權(quán)重的偏導(dǎo)數(shù);
- db:關(guān)于偏置的偏導(dǎo)數(shù)啦膜;
"""
A_pre, W, b = cache
m = A_pre.shape[1]
dA_pre = np.dot(W.T, dZ)
dW = 1./m * np.dot(dZ, A_pre.T)
db = 1./m * np.sum(dZ, axis=1, keepdims=True)
assert (dA_pre.shape == A_pre.shape)
assert (dW.shape == W.shape)
assert (db.shape == b.shape)
return dA_pre, dW, db
- 單層網(wǎng)絡(luò)的反向傳播過(guò)程[線性部分+激活函數(shù)]
def linear_activation_backward(dA,cache,activation):
"""
反向傳播-單層網(wǎng)絡(luò)有送;一個(gè)網(wǎng)絡(luò)層的反向傳播計(jì)算方法
:param dA: 對(duì)本層網(wǎng)絡(luò)輸出的偏導(dǎo)數(shù)
:param cache:前向傳播過(guò)程中緩存的元組(linear_cache,activation_cache)
:param activation:激活函數(shù)類(lèi)型:sigmoid,relu
:return:
- dA_pre:
- dW:
- db:
"""
linear_cache, activation_cache = cache
if activation == 'relu':
dZ = relu_backward(dA,activation_cache)
dA_pre, dW, db = linear_backward(dZ, linear_cache)
elif activation == 'sigmoid':
dZ = sigmoid_backward(dA, activation_cache)
dA_pre, dW, db = linear_backward(dZ, linear_cache)
return dA_pre, dW, db
- 神經(jīng)網(wǎng)絡(luò)的反向傳播過(guò)程
def L_model_backward(AL, Y, caches):
"""
反向傳播-L層深度NN;整合到一塊
網(wǎng)絡(luò)架構(gòu):X-->(linear+relu)[L-1]-->(linear+sigmoid)-->AL
:param AL: 最終輸出值僧家;
:param Y: 標(biāo)簽雀摘;
:param caches: 各層網(wǎng)絡(luò)的系數(shù)
:return: grads 各層網(wǎng)絡(luò)系數(shù)變量的梯度計(jì)算值;
"""
grads = {}
L = len(caches)
m = AL.shape[1]
Y = Y.reshape(AL.shape) # 確保AL和Y shape相同八拱;
#cost:交叉熵函數(shù)
dAL = -(np.divide(Y, AL) - np.divide(1-Y, 1-AL))
#最后一層單獨(dú)計(jì)算阵赠,之后for loop循環(huán);
current_cache = caches[L-1]
grads['dA'+str(L)], grads['dW'+str(L)], grads['db'+str(L)] = linear_activation_backward(dAL, current_cache,\
activation='sigmoid')
# 從倒數(shù)第二層開(kāi)始 for-loop:linear+relu
for i in reversed(range(L-1)):
#backward: relu->linear
current_cache = caches[i]
dA_pre_temp, dW_temp, db_temp = linear_activation_backward(grads['dA'+str(i+2)],current_cache,activation='relu')
grads['dA'+str(i+1)] = dA_pre_temp
grads["dW"+str(i+1)] = dW_temp
grads["db"+str(i+1)] = db_temp
return grads
參數(shù)優(yōu)化
參數(shù)更新過(guò)程--使用梯度下降算法;
def update_parameters_with_gd(parameters,grads,learning_rate):
"""
系數(shù)更新
:param parameters: 系數(shù)肌稻;
:param grads: 關(guān)于系數(shù)的梯度值清蚀;
:param learning_rate: 學(xué)習(xí)率更新速度
:return: parameters更新后的系數(shù)
"""
L = len(parameters) // 2
for i in range(L):
parameters['W'+str(i+1)] = parameters['W'+str(i+1)] - learning_rate * grads["dW"+str(i+1)]
parameters['b'+str(i+1)] = parameters['b'+str(i+1)] - learning_rate * grads["db"+str(i+1)]
return parameters
模型評(píng)測(cè)
用帶標(biāo)簽的數(shù)據(jù)集評(píng)測(cè)模型訓(xùn)練效果如何。
def score(params, X, y):
"""
由測(cè)試集判斷訓(xùn)練模型的好壞
:param params: 訓(xùn)練得到的參數(shù)
:param X: 測(cè)試集 [n_px*n_px*3, m]
:param y: 測(cè)試集標(biāo)簽 [1, m]
:return: accuracy 準(zhǔn)確率
"""
m = X.shape[1]
result = np.zeros((1, m))
probs, _ = L_model_forward(X, params)
for i in range(probs.shape[1]):
if probs[0, i] >= 0.5:
result[0, i] = 1
accuracy = np.mean(result == y)
return accuracy
模型預(yù)測(cè)
輸入測(cè)試集爹谭,輸出測(cè)試標(biāo)簽.
運(yùn)算過(guò)程:做一次前向傳播枷邪,得到輸出;再對(duì)輸出和threshold閾值作比較旦棉,得出類(lèi)別標(biāo)簽齿风。
def predict(params, X):
"""
給定圖片進(jìn)行測(cè)試,輸出預(yù)測(cè)標(biāo)簽
:param params: 訓(xùn)練的參數(shù)
:param X: 待預(yù)測(cè)數(shù)據(jù)
:return: 預(yù)測(cè)結(jié)果
"""
preds = np.zeros((1,X.shape[1]))
probs, _ = self.__model_forward(X,params)
for i in range(X.shape[1]):
if probs[0, i] >= 0.5:
preds[0, i] = 1
preds = np.squeeze(preds)
return preds
函數(shù)整合
def L_layer_model(X, Y, layer_dims, learning_rate=0.0052, num_iters=5000, print_cost=True):
"""
L層網(wǎng)絡(luò)模型:包括初始化绑洛、訓(xùn)練救斑;
:param X: 訓(xùn)練數(shù)據(jù)
:param Y: 數(shù)據(jù)標(biāo)簽
:param layer_dims: 各網(wǎng)絡(luò)層神經(jīng)元數(shù)目
:param learning_rate: 學(xué)習(xí)率
:param num_iters: 迭代次數(shù)
:param print_cost: 輸出cost變化
:return: paramters 訓(xùn)練后的系數(shù)
"""
np.random.seed(12)
costs = []
parameters = initialize_parameters_deep(layer_dims,type='he')
for i in range(0, num_iters):
AL, caches = L_model_forward(X, parameters)
cost = compute_cost(AL, Y)
grads = L_model_backward(AL,Y,caches)
parameters = update_parameters_with_gd(parameters,grads,learning_rate)
if print_cost and i % 100==0:
print("Cost after iteration %i:%f" %(i, cost))
costs.append(cost)
return parameters
測(cè)試:1000次迭代、學(xué)習(xí)率為0.001真屯;
layers_dims = [12288, 100, 20, 1]
params = model(X_train,y_train,layers_dims,num_iters=1000,learning_rate=0.001)
results = score(parameters,test_X,test_Y)
print(results)
輸出結(jié)果變化:
Cost after iteration 0:0.697
Cost after iteration 100:0.620
Cost after iteration 200:0.599
Cost after iteration 300:0.581
Cost after iteration 400:0.564
Cost after iteration 500:0.549
Cost after iteration 600:0.534
Cost after iteration 700:0.520
Cost after iteration 800:0.506
Cost after iteration 900:0.492
Accuracy on test set: 52%
比隨機(jī)猜測(cè)效果好一點(diǎn)點(diǎn)脸候。網(wǎng)絡(luò)層更深,優(yōu)化梯度算法绑蔫,超參數(shù)優(yōu)化---提高準(zhǔn)確率运沦!
重點(diǎn)是我們自己實(shí)現(xiàn)了一個(gè)神經(jīng)網(wǎng)絡(luò)
小結(jié)
- 理解網(wǎng)絡(luò)運(yùn)算過(guò)程時(shí),畫(huà)一個(gè)運(yùn)算圖很很大程度上幫助理解配深;
- 編碼實(shí)現(xiàn)時(shí)携添,注意變量的shape變化是否正確!
- 優(yōu)化算法:Momentum篓叶、RMSprop烈掠、Adam
- 批量梯度更新算法
- 網(wǎng)絡(luò)模型越大羞秤,參數(shù)越多,訓(xùn)練時(shí)間越長(zhǎng)
完整代碼:>>點(diǎn)我