本文主要介紹一下RNN的計算規(guī)則以及pytorch里面RNN怎么計算的，給自己備注一下诡曙。RNN是一種循環(huán)神經(jīng)網(wǎng)絡(luò)，聽起來比較復(fù)雜略水，就知道循環(huán)价卤，其他的沒了，其實它的里面就是一個數(shù)學(xué)公式而已渊涝，然后是反復(fù)的運行就沒了慎璧。不信的話，下面用代碼來試一下跨释。

pytorch RNN說明

關(guān)于pytorch RNN API的解釋官網(wǎng)有詳細介紹胸私，列一下里面內(nèi)部的公式。
$h_t =tanh(w_{ih}x_t+b_{ih}+w_{hh}h_{(t?1)}+b_{hh})$
這里的 $h_t$ 是在t時刻的隱藏層狀態(tài)鳖谈， $x_t$ 是在t時刻的輸入岁疼， $h_{t-1}$ 是上一時刻的隱藏層狀態(tài)或者是在時刻0的初始化隱藏層狀態(tài)，tanh是激活函數(shù)缆娃，可以通過參數(shù)修改成relu

參數(shù)
input_size 輸入x的特征大小
hidden_size 隱藏層h的特征大小
num_layers 神經(jīng)元的個數(shù)捷绒，如果設(shè)置為2，第二個神經(jīng)元的輸入是前一個神經(jīng)元的輸出
nonlinearity 激活函數(shù) 默認為tanh贯要，可以設(shè)置為relu
bias 是否設(shè)置偏置暖侨，默認為True
batch_first 默認為false, 設(shè)置為True之后，輸入輸出為(batch, seq, feature)
dropout 默認為0
bidirectional 默認為False崇渗，設(shè)置為RNN為雙向
返回
input(seq_len, batch, input_size) 如果batch_first輸入為input(batch, seq_len, input_size)
h_0(num_layers*num_directions, batch, hidden_size)如果bidirectional為True,num_directions為2

代碼

為了簡單起見字逗，示例代碼輸入為batch_size=1京郑，維度為10

num_layers=1, seq_len=2

from torch import nn
import torch
import numpy as np

# input_size, hidden_size, num_layers
rnn = nn.RNN(10, 10, 1)

inputR = torch.randn(2, 1, 10)

h0 = torch.randn(1, 1, 10)

output, hn = rnn(inputR, h0)

輸出

output:
tensor([[[-0.4582,  0.3975,  0.7992,  0.2567,  0.5510,  0.4386, -0.6069,
          -0.2433, -0.0597,  0.2545]],

        [[ 0.2327,  0.2221, -0.1225,  0.1365,  0.1384,  0.7557,  0.9028,
          -0.4454,  0.1529,  0.0789]]], grad_fn=<StackBackward>)

hn:
tensor([[[ 0.2327,  0.2221, -0.1225,  0.1365,  0.1384,  0.7557,  0.9028,
          -0.4454,  0.1529,  0.0789]]], grad_fn=<StackBackward>)

公式運行, 第一個seq

ih =  rnn.weight_ih_l0.data.mm(inputR[0].squeeze().view(10,1)) + rnn.bias_ih_l0.data.view(10,1)

hh = rnn.weight_hh_l0.data.mm(h0[0].squeeze().view(10, 1)) + rnn.bias_hh_l0.data.view(10,1)

temp = torch.tanh(ih+hh)
temp

輸出

tensor([[-0.4582],
        [ 0.3975],
        [ 0.7992],
        [ 0.2567],
        [ 0.5510],
        [ 0.4386],
        [-0.6069],
        [-0.2433],
        [-0.0597],
        [ 0.2545]])

第二個seq

ih =  rnn.weight_ih_l0.data.mm(inputR[1].squeeze().view(10,1)) + rnn.bias_ih_l0.data.view(10,1)

hh = rnn.weight_hh_l0.data.mm(temp) + rnn.bias_hh_l0.data.view(10,1)

temp = torch.tanh(ih+hh)
temp

輸出

tensor([[ 0.2327],
        [ 0.2221],
        [-0.1225],
        [ 0.1365],
        [ 0.1384],
        [ 0.7557],
        [ 0.9028],
        [-0.4454],
        [ 0.1529],
        [ 0.0789]])

完全一樣

num_layers=2, seq_len=1

from torch import nn
import torch
import numpy as np

# input_size, hidden_size, num_layers
rnn = nn.RNN(10, 10, 2)

inputR = torch.randn(1, 1, 10)

h0 = torch.randn(2, 1, 10)

output, hn = rnn(inputR, h0)

輸出

output:
tensor([[[-0.6109,  0.1926,  0.7245, -0.4304, -0.2992,  0.0129, -0.1721,
           0.6340, -0.3601, -0.3554]]], grad_fn=<StackBackward>)

hn:
tensor([[[ 0.0410,  0.2077, -0.6816,  0.0125,  0.3604, -0.4399,  0.7102,
          -0.0217,  0.8443, -0.1684]],

        [[-0.6109,  0.1926,  0.7245, -0.4304, -0.2992,  0.0129, -0.1721,
           0.6340, -0.3601, -0.3554]]], grad_fn=<StackBackward>)

接下來使用公式運行一遍，和上面的結(jié)果一樣葫掉，更好理解RNN

ih =  rnn.weight_ih_l0.data.mm(inputR[0].squeeze().view(10,1)) + rnn.bias_ih_l0.data.view(10,1)

hh = rnn.weight_hh_l0.data.mm(h0[0].squeeze().view(10, 1)) + rnn.bias_hh_l0.data.view(10,1)

temp = torch.tanh(ih+hh)

輸出

temp:
tensor([[ 0.0410],
        [ 0.2077],
        [-0.6816],
        [ 0.0125],
        [ 0.3604],
        [-0.4399],
        [ 0.7102],
        [-0.0217],
        [ 0.8443],
        [-0.1684]])

可以看到和hn的一部分數(shù)據(jù)完全一樣些举，因為設(shè)置num_layers為2，這個時候才經(jīng)過第一個RNN神經(jīng)元挖息，接下來經(jīng)過第二個神經(jīng)元

ih1 =  rnn.weight_ih_l1.data.mm(temp.data) + rnn.bias_ih_l1.data.view(10,1)

hh1 = rnn.weight_hh_l1.data.mm(h0[1].squeeze().view(10, 1)) + rnn.bias_hh_l1.data.view(10,1)

torch.tanh(ih1+hh1)

輸出

tensor([[-0.6109],
        [ 0.1926],
        [ 0.7245],
        [-0.4304],
        [-0.2992],
        [ 0.0129],
        [-0.1721],
        [ 0.6340],
        [-0.3601],
        [-0.3554]])

可以看到和output輸出一樣金拒，和hn的第二部分數(shù)據(jù)完全一樣兽肤。

總結(jié)

通過上面的代碼運行一遍套腹，應(yīng)該就能看出來RNN內(nèi)部如何運行的了

好的文檔關(guān)于RNN的
The Unreasonable Effectiveness of Recurrent Neural Networks
Understanding LSTM Networks
Attention and Augmented Recurrent Neural Networks

pytorch RNN的一點理解

pytorch RNN的一點理解

pytorch RNN說明

代碼

num_layers=1, seq_len=2

num_layers=2, seq_len=1

總結(jié)