來源:https://hyunhp.tistory.com/448
1. RNN cell 與 RNN 直觀圖示
RNN ---->? Recurrent Neural Network?
You can think of the recurrent neural network as the repeated use of a single cell忱详,the computations for a single time step.?
2. 輸入的維度Dimensions of input x
2.1 Input with?number of units
? ? For a single time step of a single input example,??is a one-dimensional input vector
?? ?Using language as an example, a language with a 5000-word vocabulary could be one-hot encoded into a vector that has ?units. so??could have the shape (5000,)
?? The notation??is used here to denote the number of units in a single time step of a single training example
2.2 Time Steps of size?
? A recurrent neural network has multiple time steps, which you'll be index with t.
??In the lessons, you saw a single training example?consisting of multiple time steps?. In this notebook,?will denote the number of timesteps in the longest sequence.
2.3 Batches of size m
? ?Let's say we have mini-batches, each with 20 training examples
? ?To benefit from vectorization, you'll stack 20 columns of?examples
? ?For example, the tensor has the shape (5000,20,10)
? ?You'll use m to denote the number of training examples
? ?So, the shape of a mini-batch is
2.4 3D Tensor of shape?
? ?The 3-dimensional tensor x of shape??represents the input x that is fed into the RNN
2.5 Take a 2D slice for each time step:?
?? At each time step, you'll use a mini-batch of training examples (not just a single example)
? So, for each time step t, you'll use a 2D slice of shape
? This 2D slice is referred to as?. The variable name in the code is xt.
3. 隱藏狀態(tài)的維度 hidden state a
the activation??that is passed to the RNN from one time step to another is called a "hidden state"
3.1 Dimensions of hidden state a
??Similar to the input tensor x, the hidden state for a single training example is a vector of length?
?? If you include a mini-batch or m training examples, the shape of a mini-batch is?
? When you include the time step dimension, the shape of the hidden state is?
? You'll loop through the time steps with index t, and work with 2 2D slice of the 3D tensor
? This 2D slice is referred to as?
?? In the code, the variable names used are either a_prev or a_next, depending on the function being implemented
?? The shape of this 2D slice is?
4. 輸出的維度Dimensions of prediction?
??Similar to the inputs? and hidden states,??is a 3D tensor of shape?
????????????■??: number of units in the vector representing the prediction
????????????■ m :? ? number of examples in a mini-batch
????????????■?:? number of time steps in the prediction
??For a single similar time step t, a 2D slice??has shape?
??In the code, the varriable names are:
? ? ? ? ? ? ?●? y_pred :?
? ? ? ? ? ? ?●? yt_pred :?
5. 構(gòu)建RNN
?? Here is how you can implement an RNN:
Steps:
????????????● Implement the calculations needed for one time step of the RNN.
????????????● Implement a loop over time steps in order to process all the inputs, one at a time
?? 關(guān)于 RNN Cell
You can think of the recurrent neural network as the repeated use of a single cell. First, you'll implement the computations for a single time step.?
?? RNN cell? versus RNN_cell_forward:
● Note that an RNN cell outputs the hidden state?
? ? ? ■?RNN cell is shown in the figure as the inner box with solid lines
●?The function that you'll implement, rnn_cell_forward, also calculates the prediction?
? ? ?■ RNN_cell_forward is shown in the figure as the outer? ox with dashed lines
??The following figure describes the operations for a single time step of an RNN cell:
代碼如下:
# UNQ_C1 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION: rnn_cell_forward
def rnn_cell_forward(xt, a_prev, parameters):?
? ? ? """
? ? ?【代碼注釋】
? ? ? ?Implements a single forward step of the RNN-cell as described in Figure (2)
? ? ? ?Arguments:
? ? ? ?xt -- your input data at timestep "t", numpy array of shape (n_x, m).
? ? ? ?a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)
? ? ? ?parameters -- python dictionary containing:
? ? ? ? ? ? ? ? ? ? ? ? Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
? ? ? ? ? ? ? ? ? ? ? ? Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
? ? ? ? ? ? ? ? ? ? ? ? Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
? ? ? ? ? ? ? ? ? ? ? ? ba --? Bias, numpy array of shape (n_a, 1)
? ? ? ? ? ? ? ? ? ? ? ? by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
? ? ? ? Returns:
? ? ? ? a_next -- next hidden state, of shape (n_a, m)
? ? ? ? yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)
? ? ? ? cache -- tuple of values needed for the backward pass, contains (a_next, a_prev, xt, parameters)
? ? ? ? """
? ? # Retrieve parameters from "parameters"? ?
? ? Wax = parameters["Wax"]
? ? Waa = parameters["Waa"]
? ? Wya = parameters["Wya"]
? ? ba = parameters["ba"]
? ? by = parameters["by"]
? ? ### START CODE HERE ### (≈2 lines)? ?
? ? # compute next activation state using the formula given above? ?
? ? a_next = np.tanh(np.dot(Wax, xt) + np.dot(Waa, a_prev) + ba)
? ? # compute output of the current cell using the formula given above? ?
? ? yt_pred = softmax(np.dot(Wya, a_next) + by)
? ? ### END CODE HERE ###? ?
? ? # store values you need for backward propagation in cache? ?
? ? cache = (a_next, a_prev, xt, parameters)
? ? return a_next, yt_pred, cache
執(zhí)行上述代碼
def?rnn_cell_forward_tests(rnn_cell_forward):
????????np.random.seed(1)
????????xt_tmp = np.random.randn(3, 10)
????????a_prev_tmp = np.random.randn(5, 10)
????????parameters_tmp = {}
????????parameters_tmp['Waa'] = np.random.randn(5, 5)
????????parameters_tmp['Wax'] = np.random.randn(5, 3)
????????parameters_tmp['Wya'] = np.random.randn(2, 5)
????????parameters_tmp['ba'] = np.random.randn(5, 1)
????????parameters_tmp['by'] = np.random.randn(2, 1)
????????a_next_tmp, yt_pred_tmp, cache_tmp = rnn_cell_forward(xt_tmp, a_prev_tmp, parameters_tmp)
????????print("a_next[4] = \n", a_next_tmp[4])
????????print("a_next.shape = \n", a_next_tmp.shape)
????????print("yt_pred[1] =\n", yt_pred_tmp[1])
????????print("yt_pred.shape = \n", yt_pred_tmp.shape)
# UNIT TESTS
rnn_cell_forward_tests(rnn_cell_forward)
6. RNN前向傳播的過程 RNN Forward Pass
? A recurrent neural network (RNN) is repetition of the RNN cell that you've just built.
? ? ? ● If your input sequence of data is 10 time steps long, then you will re-use the RNN cell 10 times
? Each cell takes two inputs at each time step:
? ? ? ●?: The hidden state from the previous cell
? ? ? ● ?:??The current time step's input data
? It has two outputs at each time step:
? ? ? ●? A hidden state?
? ? ??●? A prediction?
? The weights biases??are resued each time step
? ? ? ●? They are maintained between calls to rnn_cell_forward in the 'parameters' dictionary
?? 上面代碼里面沒有提
# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION: rnn_forward
def rnn_forward(x, a0, parameters):
? ? """ Implement the forward propagation of the recurrent neural network described in Figure (3).? ? ? ? Arguments:
? ? x -- Input data for every time-step, of shape (n_x, m, T_x).
????a0 -- Initial hidden state, of shape (n_a, m)
????parameters -- python dictionary containing:
????Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
????Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
????Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
????ba -- Bias numpy array of shape (n_a, 1)
????by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
????Returns:
????a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x)
????y_pred -- Predictions for every time-step, numpy array of shape (n_y, m, T_x)
????caches -- tuple of values needed for the backward pass, contains (list of caches, x)
"""
# Initialize "caches" which will contain the list of all caches
caches = []
# Retrieve dimensions from shapes of x and parameters["Wya"]
n_x, m, T_x = x.shape
n_y,n_a = parameters["Wya"].shape
### START CODE HERE ###
# initialize "a" and "y_pred" with zeros (≈2 lines)
a = np.zeros((n_a, m, T_x))
y_pred = np.zeros((n_y, m, T_x))
# Initialize a_next (≈1 line)
a_next = a0
# loop over all time-steps
for t in range(T_x):
????# Update next hidden state, compute the prediction, get the cache (≈1 line)
????a_next, yt_pred, cache = rnn_cell_forward(x[:,:,t] ,a_next, parameters)
????# Save the value of the new "next" hidden state in a (≈1 line)
????a[:,:,t] = a_next
????# Save the value of the prediction in y (≈1 line)
????y_pred[:,:,t] = yt_pred
????# Append "cache" to "caches" (≈1 line)
????caches.append(cache)
### END CODE HERE
### # store values needed for backward propagation in cache
caches = (caches, x)
return a, y_pred, caches
執(zhí)行 上述代碼
def?rnn_forward_test(rnn_forward) :
????np.random.seed(1)????x_tmp = np.random.randn(3, 10, 4)
????a0_tmp = np.random.randn(5, 10)
????parameters_tmp = {}
????parameters_tmp['Waa'] = np.random.randn(5, 5)
????parameters_tmp['Wax'] = np.random.randn(5, 3)
????parameters_tmp['Wya'] = np.random.randn(2, 5)
????parameters_tmp['ba'] = np.random.randn(5, 1)
????parameters_tmp['by'] = np.random.randn(2, 1)
????a_tmp, y_pred_tmp, caches_tmp = rnn_forward(x_tmp, a0_tmp, parameters_tmp)
????print("a[4][1] = \n", a_tmp[4][1])
????print("a.shape = \n", a_tmp.shape)
????print("y_pred[1][3] =\n", y_pred_tmp[1][3])
????print("y_pred.shape = \n", y_pred_tmp.shape)
????print("caches[1][1][3] =\n", caches_tmp[1][1][3])
????print("len(caches) = \n", len(caches_tmp))
#UNIT TEST? ?
rnn_forward_test(rnn_forward)
7. 小結(jié)
You've successfully built the forward propagation of a recurrent network from scratch.
??Situations when this RNN will peform better:
●?This will work well enough for some applications, but it suffers from vanishing gradients.
●?The RNN works best when each output??can be estimated using "local" context.
●?"Local" context refers? to information that is close to the prediction's time step t.
●? More formally, local context refers to inputs?and predictions??where is close to?
??What you should remember:
●?The recurrent neural network, or RNN , is essentially the repeated use of a single cell.
●?A basic RNN reads inputs one at a time, and remembers information through the hidden layer activations(hidden states) that are passed from one step to the next
? ? ? ■?The timestep dimension determines how many times to re-use the RNN cell
● Each cell takes into two inputs at each time step:
? ? ? ■? The hidden state from the previous cell
? ? ? ■? ?The current time step's input data
●?Each cell has two outputs at each time step:
? ? ? ■? ?A hidden state
? ? ? ■? ?A prediction