此教程翻譯自 PyTorch
官方教程
本教程通過 PyTorch
自帶的一些實例來介紹它的基本概念边苹。
PyTorch
核心提供了兩個主要特征:
- n 維的張量诚卸,類似于
numpy
污呼,但可以在 GPU 上運行 - 為構(gòu)建和訓練神經(jīng)網(wǎng)絡(luò)提供自動求導
我們講使用一個全連接的 ReLU
神經(jīng)網(wǎng)絡(luò)來作為我們運行的例子。這個神經(jīng)網(wǎng)絡(luò)只有一個隱藏層,并且通過梯度下降來最小化網(wǎng)絡(luò)輸出和真實值的歐幾里得距離來訓練神經(jīng)網(wǎng)絡(luò)擬合隨機數(shù)據(jù)。
Tensors
熱身: numpy
在介紹 PyTorch
之前,我們首先使用 numpy
來實現(xiàn)這個神經(jīng)網(wǎng)絡(luò)础钠。
numpy
提供了一個 n 維數(shù)組對象和許多操作這些數(shù)組的函數(shù)。numpy
是一個通用的科學計算框架叉谜;它沒有計算圖旗吁、深度學習和梯度的概念。然而我們可以很容易的使用 numpy
的操作手動實現(xiàn)前向傳播和反向傳播讓一個兩層的網(wǎng)絡(luò)來擬合隨機數(shù)據(jù):
import numpy as np
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)
# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)
learning_rate = 1e-6
for t in range(500):
# Forward pass: compute predicted y
h = x.dot(w1)
h_relu = np.maximum(h, 0)
y_pred = h_relu.dot(w2)
# Compute and print loss
loss = np.square(y_pred - y).sum()
print(t, loss)
# Backprop to compute gradients of w1 and w2 with respect to loss
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.T.dot(grad_y_pred)
grad_h_relu = grad_y_pred.dot(w2.T)
grad_h = grad_h_relu.copy()
grad_h_relu[h<0] = 0
grad_w1 = x.T.dot(grad_h)
# update weights
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
PyTorch: Tensors
Numpy
是一個偉大的框架停局,但是他不能利用 GPUs 來加快數(shù)值計算很钓。對于現(xiàn)代深度神經(jīng)網(wǎng)絡(luò), GPU 通常提供50倍或更高的加速董栽,所以不幸的是 numpy 不足以用于現(xiàn)代深度學習码倦。
這里,我介紹 PyTorch 中最基礎(chǔ)的概念: 張量(Tensor)锭碳。Tensor 在概念上與 numpy 數(shù)組相同袁稽,它是一個 N 維的數(shù)組,并提供很多操作它的函數(shù)擒抛。和 numpy 類似推汽,PyTorch 中的 Tensors 沒有深度學習、計算圖和梯度這些內(nèi)容歧沪,他們是科學計算的通用工具歹撒。
然而,PyTorch的Tensor 和 numpy 不同诊胞,它可以利用 GPUs 來加快數(shù)值計算暖夭。為了讓 Tensors 在 GPU上運行,你只需把它轉(zhuǎn)型維一個新的類型即可撵孤。
這里我們使用 PyTorch 來訓練一個兩層的網(wǎng)絡(luò)讓它擬合隨機數(shù)據(jù)迈着。 和上面 numpy 的例子類似,我們需要手動實現(xiàn)網(wǎng)絡(luò)的前向和反向傳播:
import torch
dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run GPU
# N is batch size; D_in is input dimenssion
# H is hidden demenssion; D_out is output dimension
N, D_in, H, D_out = 64, 1000, 100, 10
# create random input and output data
x = torch.randn(N, D_in).type(dtype)
y = torch.randn(N, D_out).type(dtype)
# Randomly initialize weights
w1 = torch.randn(D_in, H).type(dtype)
w2 = torch.randn(H, D_out).type(dtype)
learning_rate = 1e-6
for t in range(500):
# Forward pss: compute predicted y
h = x.mm(w1)
h_relu = h.clamp(min=0)
y_pred = h_relu.mm(w2)
# compute and print loss
loss = (y_pred - y).pow(2).sum()
print(t, loss)
# backprop to compute gradients of w1 and w2 with respect to loss
grad_y_pred = 3.0 * (y_pred - y)
grad_w2 = h_relu.t().mm(grad_y_pred)
grad_h_relu = grad_y_pred.mm(w2.t())
grad_h = grad_h_relu.clone()
grad_h[h<0] = 0
grad_w1 = x.t().mm(grad_h)
# update weights using gradient descent
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
自動求導
PyTorch: Variables 和 autograd
在上面的例子中邪码,我們自己實現(xiàn)了神經(jīng)網(wǎng)絡(luò)的前向和反向傳播寥假。手動實現(xiàn)反向傳播對于一個小型的兩層網(wǎng)絡(luò)來說并不是什么難題,但是對于大型復(fù)雜啊網(wǎng)絡(luò)霞扬,很快就會變得棘手。
值得慶幸的是,我們可以使用自動微分來自動計算神經(jīng)網(wǎng)絡(luò)的反向傳播喻圃。PyTorch 中的 autograd
包提供了這一功能萤彩。當我們使用自動求導時,你的神經(jīng)網(wǎng)絡(luò)的前向傳播講定義一個計算圖斧拍;圖中的節(jié)點將是 Tensors雀扶,邊表示從輸入張量產(chǎn)生輸出張量的函數(shù)。通過圖反向傳播可以讓你輕松的計算梯度肆汹。
這聽起來很復(fù)雜愚墓,但是在實際使用卻十分簡單昂勉。我們用 Variable
對象來包裝 Tensors村象。 一個 Variable 表示計算圖中的一個節(jié)點厚者。如果 x
是一個 Variable
库菲,那么 x.data
則是一個 Tensor,x.grad
是另一個用來保存 x
關(guān)于某個標量值的梯度奇颠。
PyTorch 的 Variable 擁有和 PyTorch 的 Tensors 一樣的 API:幾乎所有 Tensors 上的操作都可以在 Variable 上運行;不同之處在于使用 Variables 定義了一個計算圖广鳍, 允許你自動的計算梯度吨铸。
這里我們使用 Variable 和自動求導來實現(xiàn)我們的兩層網(wǎng)絡(luò)舟奠;現(xiàn)在我們不再需要手動地實現(xiàn)反向傳播沼瘫。
import torch
from torch.autograd import Variable
dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU
# N is batch size; D_in is input dimenssion
# H is hidden demenssion; D_out is output dimension
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold input and outpus, and wrap them in Variables.
# Setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Variables during the backward pass
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)
# Create random Tensors for weights, and wrap them in Variables.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Variables during the backward pass.
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)
learning_rate = 1e-6
for t in range(500):
# Forward pass: compute predicted y using operations on Variables; these
# are exactly the same operations we used to compute the forward pass using
# Tensors, but we do not need to keep references to intermediate values since
# we are not implementing the backward pass by hand.
y_pred = x.mm(w1).clamp(min=0).mm(w2)
# Compute and print loss using operations on Variables.
# Now loss is a Variable of shape (1,) and loss.data is a Tensor of shape
# (1,); loss.data[0] is a scalar value holding the loss.
loss = (y_pred - y).pow(2).sum()
print(t, loss.data[0])
# Use autograd to compute the backward pass. This call will compute the
# gradient of loss with respect to all Variables with requires_grad=True.
# After this call w1.grad and w2.grad will be Variables holding the gradient
# of the loss with respect to w1 and w2 respectively.
loss.backward()
# Update weights using gradient descent; w1.data and w2.data are Tensors,
# w1.grad and w2.grad are Variables and w1.grad.data and w2.grad.data are
# Tensors.
w1.data -= learning_rate * w1.grad.data
w2.data-= learning_rate * w2.grad.data
# Manually zero the gradients after updating weights
w1.grad.data.zero_()
w2.grad.data.zero_()
PyTorch:定義新的 autograd 函數(shù)
在底層膜蛔,每個原始 autograd 操作符實際上是在 Tensors 上操作的兩個函數(shù)。forward 函數(shù)從輸入張量來計算輸出張量屑墨。backward函數(shù)接收輸出向量關(guān)于某個標量的梯度,然后計算輸入張量關(guān)于關(guān)于同一個標量的梯度。
在 PyTorch 中忧设, 我們可以通過定義一個 torch.autograd.Function
的子類并實現(xiàn) forward
和 backward
兩個函數(shù)容易地實現(xiàn)我們自己的 autograd 操作符。我們可以通過定義一個該操作符的實例谨垃,并像函數(shù)一樣給它傳遞一個包含輸入數(shù)據(jù)的 Variable 來調(diào)用它,這樣就使用我們新定義的 autograd 操作符匙隔。
在這個例子中我們纷责, 我們自己定義了一個 autograd 函數(shù)來執(zhí)行 ReLU 非線性映射乡小,并使用它實現(xiàn)了一個兩層的網(wǎng)絡(luò):
import torch
from torch.autograd import Variable
class MyReLU(torch.autograd.Function):
"""
We can implement our own custom autograd Function by subclassing
torch.autograd.Function and implementing forward and backward passed
which operate on tensors
"""
def forward(self, input):
"""
In the forward pass we receive a Tensor containing the input and return a
Tensor containing the output. You can cache arbitrary Tensors for use in the
backward pass using the save_for_backward method.
"""
self.save_for_backward(input)
return input.clamp(min=0)
def backward(self, grad_output):
"""
In the backward pass we receive a Tensor containing the gradient of loss
with respect to the output, and we need to compute the gradient of the loss
with respect to the input
"""
input, = self.saved_tensors
grad_input = grad_output.clone()
grad_input[input<0] = 0
return grad_input
dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold input and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)
# Create random Tensors for weights, and wrap them in Variables.
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)
learning_rate = 1e-6
for t in range(500):
# Construct an instance of our MyReLU class to use in our network
relu = MyReLU()
# Forward pass: compute predicted y using operations on Variables; we compute
# ReLU using our custom autograd operation.
y_pred = relu(x.mm(w1)).mm(w2)
# Compute and print loss
loss = (y_pred - y).pow(2).sum()
print(t, loss.data[0])
# Use autograd to compute the backward pass
loss.backward()
# Update weights using gradient descent
w1.data -= learning_rate * w1.grad.data
w2.data -= learning_rate * w2.grad.data
# Manually zero the gradients after updateing weights
w1.grad.data.zero_()
w2.grad.data.zero_()
Tensorflow: 靜態(tài)圖
PyTorch 的 autograd 看起來很像 Tensorflow: 在兩個框架中,我們都定義一個計算圖,然后使用自動微分來計算梯度讲竿。最大的區(qū)別是: Tensorflow 中的計算圖是靜態(tài)的,而 PyTorch 使用的是動態(tài)計算圖。
在 Tensorflow 中秀仲,我們只定義一次計算圖神僵,然后重復(fù)執(zhí)行同一個圖,可能將不同的輸入數(shù)據(jù)提供給圖氓英。在 PyTorch 中,每次前向傳播都會定義一個新的計算圖练对。
靜態(tài)圖很好,因為您可以預(yù)先優(yōu)化圖; 例如螺男,一個框架可能決定為了效率而融合某些圖操作,或者想出一個在許多GPU或許多機器上的分布運行計算圖的策略淆院。 如果您一遍又一遍地重復(fù)使用同一個圖抢野,那么這個潛在的昂貴的前期優(yōu)化可以在同一個圖重復(fù)運行的情況下分攤辕棚。
靜態(tài)圖和動態(tài)圖不同的一個方面是控制流逝嚎。對于某些模型昧互,我們可能希望對數(shù)據(jù)點執(zhí)行不同的計算叽掘;例如,對于每個數(shù)據(jù)點,循環(huán)網(wǎng)絡(luò)可以展開不同數(shù)量的時間步長,這個展開可以作為一個循環(huán)來實現(xiàn)。
使用靜態(tài)圖,循環(huán)結(jié)構(gòu)需要成為圖形的一部分加酵;出于這個原因冗澈,Tensorflow 提供了像 tf.scan
這樣的操作付來將循環(huán)嵌入到計算圖中捌归。使用動態(tài)圖,這種情形就變得更簡單了:由于我們?yōu)槊總€示例動態(tài)地都賤圖虎囚,我們可以使用普通的命令式流控制來執(zhí)行每個輸入的不同計算。
為了與上面的 PyTorch 的 autograd 例子對比煤惩,這里我們使用 Tensorflow 來擬合一個簡單的兩層網(wǎng)絡(luò):
import tensorflow as tf
import numpy as np
# First we set up the computational graph:
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create placeholders for the input and target data; these will be filled
# with real data when we execute the graph.
x = tf.placeholder(tf.float32, shape=(None, D_in))
y = tf.placeholder(tf.float32, shape=(None, D_out))
# Create Variables for the weights and initialize them with random data.
# A TensorFlow Variable persists its value across executions of the graph.
w1 = tf.Variable(tf.random_normal((D_in, H)))
w2 = tf.Variable(tf.random_normal((H, D_out)))
# Forward pass: Compute the predicted y using operations on TensorFlow Tensors.
# Note that this code does not actually perform any numeric operations; it
# merely sets up the computational graph that we will later execute.
h = tf.matmul(x, w1)
h_relu = tf.maximum(h, tf.zeros(1))
y_pred = tf.matmul(h_relu, w2)
# Compute loss using operations on TensorFlow Tensors
loss = tf.reduce_sum((y - y_pred) ** 2.0)
# Compute gradient of the loss with respect to w1 and w2.
grad_w1, grad_w2 = tf.gradients(loss, [w1, w2])
# Update the weights using gradient descent. To actually update the weights
# we need to evaluate new_w1 and new_w2 when executing the graph. Note that
# in TensorFlow the the act of updating the value of the weights is part of
# the computational graph; in PyTorch this happens outside the computational
# graph.
learning_rate = 1e-6
new_w1 = w1.assign(w1 - learning_rate * grad_w1)
new_w2 = w2.assign(w2 - learning_rate * grad_w2)
# Now we have built our computational graph, so we enter a TensorFlow session to
# actually execute the graph.
with tf.Session() as sess:
# Run the graph once to initialize the Variables w1 and w2.
sess.run(tf.global_variables_initializer())
# Create numpy arrays holding the actual data for the inputs x and targets
# y
x_value = np.random.randn(N, D_in)
y_value = np.random.randn(N, D_out)
for _ in range(500):
# Execute the graph many times. Each time it executes we want to bind
# x_value to x and y_value to y, specified with the feed_dict argument.
# Each time we execute the graph we want to compute the values for loss,
# new_w1, and new_w2; the values of these Tensors are returned as numpy
# arrays.
loss_value, _, _ = sess.run([loss, new_w1, new_w2],
feed_dict={x: x_value, y: y_value})
print(loss_value)
nn 模塊
PyTorch:nn
計算圖和 autograd 是定義復(fù)雜運算符和自動求導的的一個非常強大的范例腔剂。然而對于大規(guī)模的神經(jīng)網(wǎng)絡(luò), 原始的 autograd 可能有點太低級了。
當構(gòu)建神經(jīng)網(wǎng)絡(luò)時,我們經(jīng)常想到把計算組織維層級結(jié)構(gòu)廊遍,其中一些具有可學習的參數(shù),這些參數(shù)將在學習期間被優(yōu)化见咒。
在 Tensorflow 中缤言,像 Keras
, TensorFlow-Slim
和 TFLearn
這樣的軟件包提供了對原始圖的更高級的抽象视事,這對于構(gòu)建神經(jīng)網(wǎng)絡(luò)很有用胆萧。
在 PyTorch 中, nn
包提供了同樣的功能俐东。 nn
包提供了一組模塊跌穗,他們大致相當于神經(jīng)網(wǎng)絡(luò)層。一個模塊接收一個輸入變量并計算輸出向量虏辫,也可能保存內(nèi)部狀態(tài)蚌吸,如包含可學習的參數(shù)。 nn
包還定義了一組訓練神經(jīng)網(wǎng)絡(luò)時有用的損失函數(shù)砌庄。
在這個例子中我們使用 nn
包來實現(xiàn)我們的兩層網(wǎng)絡(luò):
import torch
from torch.autograd import Variable
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)
# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Variables for its weight and bias.
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out)
)
# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(size_average=False)
learning_rate = 1e-4
for t in range(500):
# Forward pass: compute predicted y by passing x to the model. Module objects
# override the __call__ operator so you can call them like functions. When
# doing so you pass a Variable of input data to the Module and it produces
# a Variable of output data.
y_pred = model(x)
# Compute and print loss. We pass Variables containing the predicted and true
# values of y, and the loss function returns a Variable containing the
# loss.
loss = loss_fn(y_pred, y)
print(t, loss.data[0])
# Zero the gradients before running the backward pass.
model.zero_grad()
# Backward pass: compute gradient of the loss with respect to all the learnable
# parameters of the model. Internally, the parameters of each Module are stored
# in Variables with requires_grad=True, so this call will compute gradients for
# all learnable parameters in the model.
loss.backward()
for param in model.parameters():
param.data -= learning_rate * param.grad.data
PyTorch: optim
到目前為止羹唠,我們已經(jīng)更新了我們模型的權(quán)重,通過手動改變需要學習的參數(shù)變量的 .data
成員娄昆。這對于像隨機梯度下降這種簡單的優(yōu)化算法來說不是很困難肉迫,但是在實際訓練神經(jīng)網(wǎng)絡(luò)時,我們常常使用更復(fù)雜的優(yōu)化算法稿黄,如 AdaGrad,RMSProp跌造, Adam等杆怕。
PyTorch 的 optim
包抽象了優(yōu)化算法的思想,并提供了常用算法的實現(xiàn)壳贪。
在這個例子中陵珍,我們講使用 nn
包來重新定義我們之前的模型,但是我們講使用 optim
包提供的 Adam 算法來優(yōu)化我們的模型:
import torch
from torch.autograd import Variable
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)
# Use the nn package to define our model and loss function
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out)
)
loss_fn = torch.nn.MSELoss(size_average=False)
# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algoriths. The first argument to the Adam constructor tells the
# optimizer which Variables it should update.
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
# Forward pass: compute predicted y by passing x to the model.
y_pred = model(x)
# Compute and print loss
loss = loss_fn(y_pred, y)
print(t, loss.data[0])
# Before the backward pass, use the optimizer object to zero all of the
# gradients for the variables it will update (which are the learnable weights
# of the model)
optimizer.zero_grad()
# Backward pass: compute gradient of the loss with respect to model
# parameters
loss.backward()
# Calling the step function on an Optimizer makes an update to its
# parameters
optimizer.step()
PyTorch: 自定義模塊
有時你想要指定比現(xiàn)有模塊序列更復(fù)雜的模型违施;對于這種情況互纯,你可以通過繼承 nn.Module
來定義自己的模塊,并實現(xiàn) forward
函數(shù)磕蒲,他接收一個輸入變量留潦,并使用其他模塊或 autograd 操作符來生成輸出變量。
在這個例子中辣往,我們實現(xiàn)了一個兩層網(wǎng)絡(luò)來作為自定義模塊:
import torch
from torch.autograd import Variable
class TwoLayerNet(torch.nn.Module):
def __init__(self, D_in, H, D_out):
"""
In the constructor we instantiate two nn.Linear modules and assign them as member variables.
"""
super(TwoLayerNet, self).__init__()
self.linear1 = torch.nn.Linear(D_in, H)
self.linear2 = torch.nn.Linear(H, D_out)
def forward(self, x):
"""
In the forward function we accept a Variable of input data and we must return
a Variable of output data. We can use Modules defined in the constructor as
well as arbitrary operators on Variables.
"""
h_relu = self.linear1(x).clamp(min=0)
y_pred = self.linear2(h_relu)
return y_pred
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs, and wrap them in Variables
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)
# Construct our model by instantiating the class defined above
model = TwoLayerNet(D_in, H, D_out)
# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
# Forward pass: Compute predicted y by passing x to model
y_pred = model(x)
# Compute and print loss
loss = criterion(y_pred, y)
print(t, loss.data[0])
# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad()
loss.backward()
optimizer.step()
PyTorch: 控制流 + 共享權(quán)重
作為動態(tài)圖和權(quán)值共享的例子兔院,我們實現(xiàn)了一個不同的模型: 一個全連接的 ReLU 網(wǎng)絡(luò),每次前向傳播時站削, 從1到4隨機選擇一個數(shù)來作為隱藏層的層數(shù)坊萝。=,多次重復(fù)使用相同的權(quán)重來計算最內(nèi)層的隱藏層。
對于這個模型十偶,我們可以使用普通的 Python 控制流來實現(xiàn)循環(huán)菩鲜,在定義前向傳播時,通過簡單的重復(fù)使用同一個模塊惦积,我們可以實現(xiàn)最內(nèi)層之間的權(quán)重共享接校。
import random
import torch
from torch.autograd import Variable
class DynamicNet(torch.nn.Module):
def __init__(self, D_in, H, D_out):
"""
In the constructor we construct three nn.Linear instances that we will use
in the forward pass.
"""
super(DynamicNet, self).__init__()
self.input_linear = torch.nn.Linear(D_in, H)
self.middle_linear = torch.nn.Linear(H, H)
self.output_linear = torch.nn.Linear(H, D_out)
def forward(self, x):
"""
For the forward pass of the model, we randomly choose either 0, 1, 2, or 3
and reuse the middle_linear Module that many times to compute hidden layer
representations.
Since each forward pass builds a dynamic computation graph, we can use normal
Python control-flow operators like loops or conditional statements when
defining the forward pass of the model.
Here we also see that it is perfectly safe to reuse the same Module many
times when defining a computational graph. This is a big improvement from Lua
Torch, where each Module could be used only once.
"""
h_relu = self.input_linear(x).clamp(min=0)
for _ in range(random.randint(0, 3)):
h_relu = self.middle_linear(h_relu).clamp(min=0)
y_pred = self.output_linear(h_relu)
return y_pred
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs, and wrap them in Variables
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)
# Construct our model by instantiating the class defined above
model = DynamicNet(D_in, H, D_out)
# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
for t in range(500):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x)
# Compute and print loss
loss = criterion(y_pred, y)
print(t, loss.data[0])
# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad()
loss.backward()
optimizer.step()
例子
以上實例的代碼地址如下:
Tensors
autograd
PyTorch: Variables and autograd
PyTorch: Defining new autograd functions
nn Module
PyTorch: nn