本文主要介紹一下RNN的計算規(guī)則以及pytorch里面RNN怎么計算的,給自己備注一下诡曙。RNN是一種循環(huán)神經(jīng)網(wǎng)絡(luò),聽起來比較復(fù)雜略水,就知道循環(huán)价卤,其他的沒了,其實它的里面就是一個數(shù)學(xué)公式而已渊涝,然后是反復(fù)的運行就沒了慎璧。不信的話,下面用代碼來試一下跨释。
pytorch RNN說明
關(guān)于pytorch RNN API的解釋官網(wǎng)有詳細介紹胸私,列一下里面內(nèi)部的公式。
這里的是在t時刻的隱藏層狀態(tài)鳖谈,是在t時刻的輸入岁疼,是上一時刻的隱藏層狀態(tài)或者是在時刻0的初始化隱藏層狀態(tài),tanh是激活函數(shù)缆娃,可以通過參數(shù)修改成relu
參數(shù)
input_size 輸入x的特征大小
hidden_size 隱藏層h的特征大小
num_layers 神經(jīng)元的個數(shù)捷绒,如果設(shè)置為2,第二個神經(jīng)元的輸入是前一個神經(jīng)元的輸出
nonlinearity 激活函數(shù) 默認為tanh贯要,可以設(shè)置為relu
bias 是否設(shè)置偏置暖侨,默認為True
batch_first 默認為false, 設(shè)置為True之后,輸入輸出為(batch, seq, feature)
dropout 默認為0
bidirectional 默認為False崇渗,設(shè)置為RNN為雙向
返回
input(seq_len, batch, input_size) 如果batch_first輸入為input(batch, seq_len, input_size)
h_0(num_layers*num_directions, batch, hidden_size)如果bidirectional為True,num_directions為2
代碼
為了簡單起見字逗,示例代碼輸入為batch_size=1京郑,維度為10
num_layers=1, seq_len=2
from torch import nn
import torch
import numpy as np
# input_size, hidden_size, num_layers
rnn = nn.RNN(10, 10, 1)
inputR = torch.randn(2, 1, 10)
h0 = torch.randn(1, 1, 10)
output, hn = rnn(inputR, h0)
輸出
output:
tensor([[[-0.4582, 0.3975, 0.7992, 0.2567, 0.5510, 0.4386, -0.6069,
-0.2433, -0.0597, 0.2545]],
[[ 0.2327, 0.2221, -0.1225, 0.1365, 0.1384, 0.7557, 0.9028,
-0.4454, 0.1529, 0.0789]]], grad_fn=<StackBackward>)
hn:
tensor([[[ 0.2327, 0.2221, -0.1225, 0.1365, 0.1384, 0.7557, 0.9028,
-0.4454, 0.1529, 0.0789]]], grad_fn=<StackBackward>)
公式運行, 第一個seq
ih = rnn.weight_ih_l0.data.mm(inputR[0].squeeze().view(10,1)) + rnn.bias_ih_l0.data.view(10,1)
hh = rnn.weight_hh_l0.data.mm(h0[0].squeeze().view(10, 1)) + rnn.bias_hh_l0.data.view(10,1)
temp = torch.tanh(ih+hh)
temp
輸出
tensor([[-0.4582],
[ 0.3975],
[ 0.7992],
[ 0.2567],
[ 0.5510],
[ 0.4386],
[-0.6069],
[-0.2433],
[-0.0597],
[ 0.2545]])
第二個seq
ih = rnn.weight_ih_l0.data.mm(inputR[1].squeeze().view(10,1)) + rnn.bias_ih_l0.data.view(10,1)
hh = rnn.weight_hh_l0.data.mm(temp) + rnn.bias_hh_l0.data.view(10,1)
temp = torch.tanh(ih+hh)
temp
輸出
tensor([[ 0.2327],
[ 0.2221],
[-0.1225],
[ 0.1365],
[ 0.1384],
[ 0.7557],
[ 0.9028],
[-0.4454],
[ 0.1529],
[ 0.0789]])
完全一樣
num_layers=2, seq_len=1
from torch import nn
import torch
import numpy as np
# input_size, hidden_size, num_layers
rnn = nn.RNN(10, 10, 2)
inputR = torch.randn(1, 1, 10)
h0 = torch.randn(2, 1, 10)
output, hn = rnn(inputR, h0)
輸出
output:
tensor([[[-0.6109, 0.1926, 0.7245, -0.4304, -0.2992, 0.0129, -0.1721,
0.6340, -0.3601, -0.3554]]], grad_fn=<StackBackward>)
hn:
tensor([[[ 0.0410, 0.2077, -0.6816, 0.0125, 0.3604, -0.4399, 0.7102,
-0.0217, 0.8443, -0.1684]],
[[-0.6109, 0.1926, 0.7245, -0.4304, -0.2992, 0.0129, -0.1721,
0.6340, -0.3601, -0.3554]]], grad_fn=<StackBackward>)
接下來使用公式運行一遍,和上面的結(jié)果一樣葫掉,更好理解RNN
ih = rnn.weight_ih_l0.data.mm(inputR[0].squeeze().view(10,1)) + rnn.bias_ih_l0.data.view(10,1)
hh = rnn.weight_hh_l0.data.mm(h0[0].squeeze().view(10, 1)) + rnn.bias_hh_l0.data.view(10,1)
temp = torch.tanh(ih+hh)
輸出
temp:
tensor([[ 0.0410],
[ 0.2077],
[-0.6816],
[ 0.0125],
[ 0.3604],
[-0.4399],
[ 0.7102],
[-0.0217],
[ 0.8443],
[-0.1684]])
可以看到和hn的一部分數(shù)據(jù)完全一樣些举,因為設(shè)置num_layers為2,這個時候才經(jīng)過第一個RNN神經(jīng)元挖息,接下來經(jīng)過第二個神經(jīng)元
ih1 = rnn.weight_ih_l1.data.mm(temp.data) + rnn.bias_ih_l1.data.view(10,1)
hh1 = rnn.weight_hh_l1.data.mm(h0[1].squeeze().view(10, 1)) + rnn.bias_hh_l1.data.view(10,1)
torch.tanh(ih1+hh1)
輸出
tensor([[-0.6109],
[ 0.1926],
[ 0.7245],
[-0.4304],
[-0.2992],
[ 0.0129],
[-0.1721],
[ 0.6340],
[-0.3601],
[-0.3554]])
可以看到和output輸出一樣金拒,和hn的第二部分數(shù)據(jù)完全一樣兽肤。
總結(jié)
通過上面的代碼運行一遍套腹,應(yīng)該就能看出來RNN內(nèi)部如何運行的了
好的文檔關(guān)于RNN的
The Unreasonable Effectiveness of Recurrent Neural Networks
Understanding LSTM Networks
Attention and Augmented Recurrent Neural Networks