序列化模型(rnn)對(duì)于自然語(yǔ)言處理和一些序列化的任務(wù)是非常有作用的,因?yàn)樗麄兪谴嬖凇坝洃洝惫δ艿?br>
說(shuō)明:
上標(biāo)[l]:表示與第 l 層相關(guān)聯(lián)的對(duì)象
上標(biāo)(i):表示與第 i 個(gè)示例關(guān)聯(lián)的對(duì)象
上標(biāo)<t>:表示第 t 時(shí)間步處的對(duì)象
下標(biāo) i : 表示向量的第 i 項(xiàng)
導(dǎo)入我們所需的包
import numpy as np
from rnn_utils import *
1. 基本RNN的前向傳播
RNN基本結(jié)構(gòu)(本例Tx=Ty):
實(shí)現(xiàn)步驟
- 實(shí)現(xiàn)RNN的一個(gè)時(shí)間步驟所需的計(jì)算倘屹。
- 在Tx時(shí)間步上實(shí)現(xiàn)一個(gè)循環(huán),一次處理一個(gè)輸入值。
1.1 實(shí)現(xiàn)一個(gè)rnn單元
循環(huán)(遞歸)神經(jīng)網(wǎng)絡(luò)可以看作是單個(gè)細(xì)胞的重復(fù)拍谐。首先要實(shí)現(xiàn)一個(gè)時(shí)間步的計(jì)算。
實(shí)現(xiàn)步驟
- 計(jì)算隱藏狀態(tài)的值a<t>
- 根據(jù)a<t>,計(jì)算預(yù)測(cè)值yhat<t>
- 將(a<t>, a<t-1>, x<t>, parameters)存儲(chǔ)到cache中
- 返回值a<t>, y<t> and cache
我們將矢量化m個(gè)例子气嫁,x<t>尺寸(nx, m),a<t>尺寸(na, m)
def rnn_cell_forward(xt, a_prev, parameters):
"""
Arguments:
xt shape: (n_x, m).
a_prev shape: (n_a, m)
parameters -- python dictionary containing:
Wax -- shape (n_a, n_x)
Waa -- shape (n_a, n_a)
Wya -- shape (n_ y, n_a)
ba -- shape (n_a, 1)
by -- shape (n_y, 1)
Returns:
a_next -- shape (n_a, m)
yt_pred -- shape (n_y, m)
cache -- 元組当窗,包含向后傳遞所需的值 (a_next, a_prev, xt, parameters)
"""
# 從"parameters"中檢索參數(shù)
Wax = parameters["Wax"]
Waa = parameters["Waa"]
Wya = parameters["Wya"]
ba = parameters["ba"]
by = parameters["by"]
a_next = np.tanh(np.add(np.add(np.matmul(Wax,xt),np.matmul(Waa,a_prev)),ba))
yt_pred = softmax(np.add(np.matmul(Wya,a_next),by))
# store values you need for backward propagation in cache
cache = (a_next, a_prev, xt, parameters)
return a_next, yt_pred, cache
1.2 rnn向前傳播
rnn可以看作將前面的rnn細(xì)胞進(jìn)行重復(fù)寸宵,如果輸入的數(shù)據(jù)序列經(jīng)過(guò)10個(gè)時(shí)間步梯影,那將復(fù)制rnn單元10次巫员,每個(gè)單元都將前一個(gè)單元的隱藏狀態(tài)a<t-1>和當(dāng)前時(shí)間步的輸入數(shù)據(jù)x<t>作為輸入,它輸出該時(shí)間步的隱藏狀態(tài)a<t>和預(yù)測(cè)y<t>
實(shí)現(xiàn)步驟
- 創(chuàng)建一個(gè)零向量(a)甲棍,它將存儲(chǔ)由RNN計(jì)算的所有隱藏狀態(tài)简识。
- 初始化隱藏狀態(tài)a_next = a0
- 開(kāi)始循環(huán)每個(gè)時(shí)間步,增量索引為t:
- 通過(guò)運(yùn)行rnn_cell_forward更新下一個(gè)隱藏狀態(tài)和緩存
- 存儲(chǔ)隱藏狀態(tài)到 a
- 存儲(chǔ)預(yù)測(cè)值到 y
- 添加cache到caches
- 返回a, y 和 caches
def rnn_forward(x, a0, parameters):
"""
說(shuō)明:
x -- 輸入感猛,shape (n_x, m, T_x).
a0 -- 初始隱藏狀態(tài)七扰,shape (n_a, m)
parameters -- python字典包含:
Waa -- shape (n_a, n_a)
Wax -- shape (n_a, n_x)
Wya -- shape (n_y, n_a)
ba -- shape (n_a, 1)
by -- shape (n_y, 1)
Returns:
a -- shape (n_a, m, T_x)
y_pred -- shape (n_y, m, T_x)
caches --元組,包含向后傳遞所需的值 (list of caches, x)
"""
# 初始化陪白,包含caches列表
caches = []
n_x, m, T_x = x.shape
n_y, n_a = parameters["Wya"].shape
# 初始化
a = np.zeros([n_a, m,T_x])
y_pred = np.zeros([n_y, m, T_x])
# 初始化a_next 為a0
a_next = a0
# 循環(huán)time-steps
for t in range(T_x):
a_next, yt_pred, cache = rnn_cell_forward(x[:,:,t], a_next, parameters)
a[:,:,t] = a_next
y_pred[:,:,t] = yt_pred
caches.append(cache)
# store values needed for backward propagation in cache
caches = (caches, x)
return a, y_pred, caches
2.長(zhǎng)短期記憶(LSTM)
LSTM基本結(jié)構(gòu):
關(guān)于門(mén)
-
遺忘門(mén)
-
更新門(mén)
-
更新單元
新?tīng)顟B(tài)為:
-
輸出門(mén)
2.1 LSTM單元
實(shí)現(xiàn)步驟
- 將a<t-1>和x<t>上下連接起來(lái)颈走,組成矩陣concat
- 計(jì)算前面的公式
- 計(jì)算預(yù)測(cè)值y<t>
def lstm_cell_forward(xt, a_prev, c_prev, parameters):
"""
Arguments:
xt -- shape (n_x, m).
a_prev -- shape (n_a, m)
c_prev -- shape (n_a, m)
parameters -- python 字典:
Wf -- shape (n_a, n_a + n_x)
bf -- shape (n_a, 1)
Wi -- shape (n_a, n_a + n_x)
bi -- shape (n_a, 1)
Wc -- shape (n_a, n_a + n_x)
bc -- shape (n_a, 1)
Wo -- shape (n_a, n_a + n_x)
bo -- shape (n_a, 1)
Wy -- shape (n_y, n_a)
by -- shape (n_y, 1)
Returns:
a_next -- shape (n_a, m)
c_next -- shape (n_a, m)
yt_pred -- shape (n_y, m)
cache --元組,包含向后傳遞所需的值(a_next, c_next, a_prev, c_prev, xt, parameters)
注意: ft / it / ot 代表 遺忘門(mén)咱士、更新門(mén)立由、輸出門(mén)
cct :代表候選值 (c tilda),
c :代表記憶值
"""
# 檢索參數(shù)
Wf = parameters["Wf"]
bf = parameters["bf"]
Wi = parameters["Wi"]
bi = parameters["bi"]
Wc = parameters["Wc"]
bc = parameters["bc"]
Wo = parameters["Wo"]
bo = parameters["bo"]
Wy = parameters["Wy"]
by = parameters["by"]
n_x, m = xt.shape
n_y, n_a = Wy.shape
# 連接 a_prev 和 xt 在一個(gè)矩陣中
concat = np.zeros([n_x+n_a,m])
concat[: n_a, :] = a_prev
concat[n_a :, :] = xt
# 計(jì)算 ft, it, cct, c_next, ot, a_next 用下面的公式:
ft = sigmoid(np.dot(Wf,concat) + bf)
it = sigmoid(np.dot(Wi,concat) + bi)
cct = np.tanh(np.dot(Wc,concat) + bc)
c_next = ft*c_prev + it*cct
ot = sigmoid(np.dot(Wo,concat) + bo)
a_next = ot * np.tanh(c_next)
# 計(jì)算預(yù)測(cè)值
yt_pred = softmax(np.dot(Wy,a_next) + by)
# 存儲(chǔ)反向傳播需要的值到 cache
cache = (a_next, c_next, a_prev, c_prev, ft, it, cct, ot, xt, parameters)
return a_next, c_next, yt_pred, cache
2.2 LSTM向前傳播
現(xiàn)在您已經(jīng)實(shí)現(xiàn)了LSTM的一個(gè)步驟轧钓,現(xiàn)在可以使用for循環(huán)來(lái)處理一系列Tx輸入。
def lstm_forward(x, a0, parameters):
"""
Arguments:
x -- shape (n_x, m, T_x).
a0 -- shape (n_a, m)
parameters -- python 字典:
Wf -- shape (n_a, n_a + n_x)
bf -- shape (n_a, 1)
Wi -- shape (n_a, n_a + n_x)
bi -- shape (n_a, 1)
Wc -- shape (n_a, n_a + n_x)
bc -- shape (n_a, 1)
Wo -- shape (n_a, n_a + n_x)
bo -- shape (n_a, 1)
Wy -- shape (n_y, n_a)
by -- shape (n_y, 1)
Returns:
a -- shape (n_a, m, T_x)
y -- shape (n_y, m, T_x)
caches -- 元組锐膜,包含向后傳遞所需的值 (list of all the caches, x)
"""
caches = []
n_x, m, T_x = x.shape
n_y, n_a = parameters["Wy"].shape
# initialize "a", "c" and "y" with zeros (≈3 lines)
a = np.zeros((n_a, m, T_x))
c = np.zeros((n_a, m, T_x))
y = np.zeros((n_y, m, T_x))
# 初始化a_next and c_next
a_next = a0
c_next = np.zeros((n_a, m))
# 循環(huán) time-steps
for t in range(T_x):
a_next, c_next, yt, cache = lstm_cell_forward(x[:,:,t], a_next, c_next, parameters)
a[:,:,t] = a_next
y[:,:,t] = yt
c[:,:,t] = c_next
caches.append(cache)
caches = (caches, x)
return a, y, c, caches
3. 反向傳播
用框架實(shí)現(xiàn)毕箍。。道盏。
備注
文件rnn_utils代碼如下:
import numpy as np
def softmax(x):
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum(axis=0)
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def initialize_adam(parameters) :
"""
Initializes v and s as two python dictionaries with:
- keys: "dW1", "db1", ..., "dWL", "dbL"
- values: numpy arrays of zeros of the same shape as the corresponding gradients/parameters.
Arguments:
parameters -- python dictionary containing your parameters.
parameters["W" + str(l)] = Wl
parameters["b" + str(l)] = bl
Returns:
v -- python dictionary that will contain the exponentially weighted average of the gradient.
v["dW" + str(l)] = ...
v["db" + str(l)] = ...
s -- python dictionary that will contain the exponentially weighted average of the squared gradient.
s["dW" + str(l)] = ...
s["db" + str(l)] = ...
"""
L = len(parameters) // 2 # number of layers in the neural networks
v = {}
s = {}
# Initialize v, s. Input: "parameters". Outputs: "v, s".
for l in range(L):
### START CODE HERE ### (approx. 4 lines)
v["dW" + str(l+1)] = np.zeros(parameters["W" + str(l+1)].shape)
v["db" + str(l+1)] = np.zeros(parameters["b" + str(l+1)].shape)
s["dW" + str(l+1)] = np.zeros(parameters["W" + str(l+1)].shape)
s["db" + str(l+1)] = np.zeros(parameters["b" + str(l+1)].shape)
### END CODE HERE ###
return v, s
def update_parameters_with_adam(parameters, grads, v, s, t, learning_rate = 0.01,
beta1 = 0.9, beta2 = 0.999, epsilon = 1e-8):
"""
Update parameters using Adam
Arguments:
parameters -- python dictionary containing your parameters:
parameters['W' + str(l)] = Wl
parameters['b' + str(l)] = bl
grads -- python dictionary containing your gradients for each parameters:
grads['dW' + str(l)] = dWl
grads['db' + str(l)] = dbl
v -- Adam variable, moving average of the first gradient, python dictionary
s -- Adam variable, moving average of the squared gradient, python dictionary
learning_rate -- the learning rate, scalar.
beta1 -- Exponential decay hyperparameter for the first moment estimates
beta2 -- Exponential decay hyperparameter for the second moment estimates
epsilon -- hyperparameter preventing division by zero in Adam updates
Returns:
parameters -- python dictionary containing your updated parameters
v -- Adam variable, moving average of the first gradient, python dictionary
s -- Adam variable, moving average of the squared gradient, python dictionary
"""
L = len(parameters) // 2 # number of layers in the neural networks
v_corrected = {} # Initializing first moment estimate, python dictionary
s_corrected = {} # Initializing second moment estimate, python dictionary
# Perform Adam update on all parameters
for l in range(L):
# Moving average of the gradients. Inputs: "v, grads, beta1". Output: "v".
### START CODE HERE ### (approx. 2 lines)
v["dW" + str(l+1)] = beta1 * v["dW" + str(l+1)] + (1 - beta1) * grads["dW" + str(l+1)]
v["db" + str(l+1)] = beta1 * v["db" + str(l+1)] + (1 - beta1) * grads["db" + str(l+1)]
### END CODE HERE ###
# Compute bias-corrected first moment estimate. Inputs: "v, beta1, t". Output: "v_corrected".
### START CODE HERE ### (approx. 2 lines)
v_corrected["dW" + str(l+1)] = v["dW" + str(l+1)] / (1 - beta1**t)
v_corrected["db" + str(l+1)] = v["db" + str(l+1)] / (1 - beta1**t)
### END CODE HERE ###
# Moving average of the squared gradients. Inputs: "s, grads, beta2". Output: "s".
### START CODE HERE ### (approx. 2 lines)
s["dW" + str(l+1)] = beta2 * s["dW" + str(l+1)] + (1 - beta2) * (grads["dW" + str(l+1)] ** 2)
s["db" + str(l+1)] = beta2 * s["db" + str(l+1)] + (1 - beta2) * (grads["db" + str(l+1)] ** 2)
### END CODE HERE ###
# Compute bias-corrected second raw moment estimate. Inputs: "s, beta2, t". Output: "s_corrected".
### START CODE HERE ### (approx. 2 lines)
s_corrected["dW" + str(l+1)] = s["dW" + str(l+1)] / (1 - beta2 ** t)
s_corrected["db" + str(l+1)] = s["db" + str(l+1)] / (1 - beta2 ** t)
### END CODE HERE ###
# Update parameters. Inputs: "parameters, learning_rate, v_corrected, s_corrected, epsilon". Output: "parameters".
### START CODE HERE ### (approx. 2 lines)
parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - learning_rate * v_corrected["dW" + str(l+1)] / np.sqrt(s_corrected["dW" + str(l+1)] + epsilon)
parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - learning_rate * v_corrected["db" + str(l+1)] / np.sqrt(s_corrected["db" + str(l+1)] + epsilon)
### END CODE HERE ###
return parameters, v, s