用例子學習 PyTorch

此教程翻譯自 PyTorch 官方教程

本教程通過 PyTorch 自帶的一些實例來介紹它的基本概念边苹。
PyTorch 核心提供了兩個主要特征：

n 維的張量诚卸，類似于 numpy 污呼，但可以在 GPU 上運行
為構(gòu)建和訓練神經(jīng)網(wǎng)絡(luò)提供自動求導

我們講使用一個全連接的 ReLU 神經(jīng)網(wǎng)絡(luò)來作為我們運行的例子。這個神經(jīng)網(wǎng)絡(luò)只有一個隱藏層，并且通過梯度下降來最小化網(wǎng)絡(luò)輸出和真實值的歐幾里得距離來訓練神經(jīng)網(wǎng)絡(luò)擬合隨機數(shù)據(jù)。

Tensors

熱身： numpy

在介紹 PyTorch 之前，我們首先使用 numpy 來實現(xiàn)這個神經(jīng)網(wǎng)絡(luò)础钠。
numpy 提供了一個 n 維數(shù)組對象和許多操作這些數(shù)組的函數(shù)。numpy 是一個通用的科學計算框架叉谜；它沒有計算圖旗吁、深度學習和梯度的概念。然而我們可以很容易的使用 numpy 的操作手動實現(xiàn)前向傳播和反向傳播讓一個兩層的網(wǎng)絡(luò)來擬合隨機數(shù)據(jù)：

import numpy as np

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6

for t in range(500):
    # Forward pass: compute predicted y
    h = x.dot(w1)
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2)

    # Compute and print loss
    loss = np.square(y_pred - y).sum()
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h_relu[h<0] = 0
    grad_w1 = x.T.dot(grad_h)

    # update weights
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

PyTorch: Tensors

Numpy 是一個偉大的框架停局，但是他不能利用 GPUs 來加快數(shù)值計算很钓。對于現(xiàn)代深度神經(jīng)網(wǎng)絡(luò)， GPU 通常提供50倍或更高的加速董栽，所以不幸的是 numpy 不足以用于現(xiàn)代深度學習码倦。

這里，我介紹 PyTorch 中最基礎(chǔ)的概念：張量（Tensor）锭碳。Tensor 在概念上與 numpy 數(shù)組相同袁稽，它是一個 N 維的數(shù)組，并提供很多操作它的函數(shù)擒抛。和 numpy 類似推汽，PyTorch 中的 Tensors 沒有深度學習、計算圖和梯度這些內(nèi)容歧沪，他們是科學計算的通用工具歹撒。

然而，PyTorch的Tensor 和 numpy 不同诊胞，它可以利用 GPUs 來加快數(shù)值計算暖夭。為了讓 Tensors 在 GPU上運行，你只需把它轉(zhuǎn)型維一個新的類型即可撵孤。

這里我們使用 PyTorch 來訓練一個兩層的網(wǎng)絡(luò)讓它擬合隨機數(shù)據(jù)迈着。和上面 numpy 的例子類似，我們需要手動實現(xiàn)網(wǎng)絡(luò)的前向和反向傳播：

import torch

dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run GPU

# N is batch size; D_in is input dimenssion
# H is hidden demenssion; D_out is output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# create random input and output data
x = torch.randn(N, D_in).type(dtype)
y = torch.randn(N, D_out).type(dtype)

# Randomly initialize weights
w1 = torch.randn(D_in, H).type(dtype)
w2 = torch.randn(H, D_out).type(dtype)

learning_rate = 1e-6

for t in range(500):
    # Forward pss: compute predicted y
    h = x.mm(w1)
    h_relu = h.clamp(min=0)
    y_pred = h_relu.mm(w2)

    # compute and print loss
    loss = (y_pred - y).pow(2).sum()
    print(t, loss)

    # backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 3.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h<0] = 0
    grad_w1 = x.t().mm(grad_h)

    # update weights using gradient descent
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

自動求導

PyTorch： Variables 和 autograd

在上面的例子中邪码，我們自己實現(xiàn)了神經(jīng)網(wǎng)絡(luò)的前向和反向傳播寥假。手動實現(xiàn)反向傳播對于一個小型的兩層網(wǎng)絡(luò)來說并不是什么難題，但是對于大型復(fù)雜啊網(wǎng)絡(luò)霞扬，很快就會變得棘手。

值得慶幸的是，我們可以使用自動微分來自動計算神經(jīng)網(wǎng)絡(luò)的反向傳播喻圃。PyTorch 中的 autograd 包提供了這一功能萤彩。當我們使用自動求導時，你的神經(jīng)網(wǎng)絡(luò)的前向傳播講定義一個計算圖斧拍；圖中的節(jié)點將是 Tensors雀扶，邊表示從輸入張量產(chǎn)生輸出張量的函數(shù)。通過圖反向傳播可以讓你輕松的計算梯度肆汹。

這聽起來很復(fù)雜愚墓，但是在實際使用卻十分簡單昂勉。我們用 Variable 對象來包裝 Tensors村象。一個 Variable 表示計算圖中的一個節(jié)點厚者。如果 x 是一個 Variable库菲，那么 x.data 則是一個 Tensor，x.grad 是另一個用來保存 x 關(guān)于某個標量值的梯度奇颠。

PyTorch 的 Variable 擁有和 PyTorch 的 Tensors 一樣的 API：幾乎所有 Tensors 上的操作都可以在 Variable 上運行；不同之處在于使用 Variables 定義了一個計算圖广鳍，允許你自動的計算梯度吨铸。

這里我們使用 Variable 和自動求導來實現(xiàn)我們的兩層網(wǎng)絡(luò)舟奠；現(xiàn)在我們不再需要手動地實現(xiàn)反向傳播沼瘫。

import torch
from torch.autograd import Variable


dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

# N is batch size; D_in is input dimenssion
# H is hidden demenssion; D_out is output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outpus, and wrap them in Variables.
# Setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Variables during the backward pass
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)

# Create random Tensors for weights, and wrap them in Variables.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Variables during the backward pass.
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y using operations on Variables; these
    # are exactly the same operations we used to compute the forward pass using
    # Tensors, but we do not need to keep references to intermediate values since
    # we are not implementing the backward pass by hand.
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    # Compute and print loss using operations on Variables.
    # Now loss is a Variable of shape (1,) and loss.data is a Tensor of shape
    # (1,); loss.data[0] is a scalar value holding the loss.

    loss = (y_pred - y).pow(2).sum()
    print(t, loss.data[0])

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Variables with requires_grad=True.
    # After this call w1.grad and w2.grad will be Variables holding the gradient
    # of the loss with respect to w1 and w2 respectively.
    loss.backward()

    # Update weights using gradient descent; w1.data and w2.data are Tensors,
    # w1.grad and w2.grad are Variables and w1.grad.data and w2.grad.data are
    # Tensors.
    w1.data -= learning_rate * w1.grad.data
    w2.data-= learning_rate * w2.grad.data

    # Manually zero the gradients after updating weights
    w1.grad.data.zero_()
    w2.grad.data.zero_()

PyTorch:定義新的 autograd 函數(shù)

在底層膜蛔，每個原始 autograd 操作符實際上是在 Tensors 上操作的兩個函數(shù)。forward 函數(shù)從輸入張量來計算輸出張量屑墨。backward函數(shù)接收輸出向量關(guān)于某個標量的梯度，然后計算輸入張量關(guān)于關(guān)于同一個標量的梯度。

在 PyTorch 中忧设，我們可以通過定義一個 torch.autograd.Function 的子類并實現(xiàn) forward 和 backward兩個函數(shù)容易地實現(xiàn)我們自己的 autograd 操作符。我們可以通過定義一個該操作符的實例谨垃，并像函數(shù)一樣給它傳遞一個包含輸入數(shù)據(jù)的 Variable 來調(diào)用它，這樣就使用我們新定義的 autograd 操作符匙隔。

在這個例子中我們纷责，我們自己定義了一個 autograd 函數(shù)來執(zhí)行 ReLU 非線性映射乡小，并使用它實現(xiàn)了一個兩層的網(wǎng)絡(luò)：

import torch
from torch.autograd import Variable

class MyReLU(torch.autograd.Function):
    """
    We can implement our own custom autograd Function by subclassing
    torch.autograd.Function and implementing forward and backward passed
    which operate on tensors
    """
    def forward(self, input):
        """
        In the forward pass we receive a  Tensor containing the input and return a
        Tensor containing the output. You can cache arbitrary Tensors for use in the
        backward pass using the save_for_backward method.
        """
        self.save_for_backward(input)
        return input.clamp(min=0)

    def backward(self, grad_output):
        """
        In the backward pass we receive a Tensor containing the gradient of loss 
        with respect to the output, and we need to compute the gradient of the loss
        with respect to the input
        """
        
        input, = self.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input<0] = 0
        return grad_input


dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)

# Create random Tensors for weights, and wrap them in Variables.
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)

learning_rate = 1e-6

for t in range(500):
    # Construct an instance of our MyReLU class to use in our network
    relu = MyReLU()

    # Forward pass: compute predicted y using operations on Variables; we compute
    # ReLU using our custom autograd operation.
    y_pred = relu(x.mm(w1)).mm(w2)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.data[0])

    # Use autograd to compute the backward pass
    loss.backward()

    # Update weights using gradient descent
    w1.data -= learning_rate * w1.grad.data
    w2.data -= learning_rate * w2.grad.data

    # Manually zero the gradients after updateing weights
    w1.grad.data.zero_()
    w2.grad.data.zero_()

Tensorflow: 靜態(tài)圖

PyTorch 的 autograd 看起來很像 Tensorflow：在兩個框架中，我們都定義一個計算圖，然后使用自動微分來計算梯度讲竿。最大的區(qū)別是： Tensorflow 中的計算圖是靜態(tài)的，而 PyTorch 使用的是動態(tài)計算圖。

在 Tensorflow 中秀仲，我們只定義一次計算圖神僵，然后重復(fù)執(zhí)行同一個圖，可能將不同的輸入數(shù)據(jù)提供給圖氓英。在 PyTorch 中，每次前向傳播都會定義一個新的計算圖练对。

靜態(tài)圖很好，因為您可以預(yù)先優(yōu)化圖; 例如螺男，一個框架可能決定為了效率而融合某些圖操作，或者想出一個在許多GPU或許多機器上的分布運行計算圖的策略淆院。如果您一遍又一遍地重復(fù)使用同一個圖抢野，那么這個潛在的昂貴的前期優(yōu)化可以在同一個圖重復(fù)運行的情況下分攤辕棚。

靜態(tài)圖和動態(tài)圖不同的一個方面是控制流逝嚎。對于某些模型昧互，我們可能希望對數(shù)據(jù)點執(zhí)行不同的計算叽掘；例如，對于每個數(shù)據(jù)點，循環(huán)網(wǎng)絡(luò)可以展開不同數(shù)量的時間步長，這個展開可以作為一個循環(huán)來實現(xiàn)。

使用靜態(tài)圖，循環(huán)結(jié)構(gòu)需要成為圖形的一部分加酵；出于這個原因冗澈，Tensorflow 提供了像 tf.scan 這樣的操作付來將循環(huán)嵌入到計算圖中捌归。使用動態(tài)圖，這種情形就變得更簡單了：由于我們?yōu)槊總€示例動態(tài)地都賤圖虎囚，我們可以使用普通的命令式流控制來執(zhí)行每個輸入的不同計算。

為了與上面的 PyTorch 的 autograd 例子對比煤惩，這里我們使用 Tensorflow 來擬合一個簡單的兩層網(wǎng)絡(luò)：

import tensorflow as tf
import numpy as np

# First we set up the computational graph:

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create placeholders for the input and target data; these will be filled
# with real data when we execute the graph.
x = tf.placeholder(tf.float32, shape=(None, D_in))
y = tf.placeholder(tf.float32, shape=(None, D_out))

# Create Variables for the weights and initialize them with random data.
# A TensorFlow Variable persists its value across executions of the graph.
w1 = tf.Variable(tf.random_normal((D_in, H)))
w2 = tf.Variable(tf.random_normal((H, D_out)))

# Forward pass: Compute the predicted y using operations on TensorFlow Tensors.
# Note that this code does not actually perform any numeric operations; it
# merely sets up the computational graph that we will later execute.
h = tf.matmul(x, w1)
h_relu = tf.maximum(h, tf.zeros(1))
y_pred = tf.matmul(h_relu, w2)

# Compute loss using operations on TensorFlow Tensors
loss = tf.reduce_sum((y - y_pred) ** 2.0)

# Compute gradient of the loss with respect to w1 and w2.
grad_w1, grad_w2 = tf.gradients(loss, [w1, w2])

# Update the weights using gradient descent. To actually update the weights
# we need to evaluate new_w1 and new_w2 when executing the graph. Note that
# in TensorFlow the the act of updating the value of the weights is part of
# the computational graph; in PyTorch this happens outside the computational
# graph.
learning_rate = 1e-6
new_w1 = w1.assign(w1 - learning_rate * grad_w1)
new_w2 = w2.assign(w2 - learning_rate * grad_w2)

# Now we have built our computational graph, so we enter a TensorFlow session to
# actually execute the graph.
with tf.Session() as sess:
    # Run the graph once to initialize the Variables w1 and w2.
    sess.run(tf.global_variables_initializer())

    # Create numpy arrays holding the actual data for the inputs x and targets
    # y
    x_value = np.random.randn(N, D_in)
    y_value = np.random.randn(N, D_out)
    for _ in range(500):
        # Execute the graph many times. Each time it executes we want to bind
        # x_value to x and y_value to y, specified with the feed_dict argument.
        # Each time we execute the graph we want to compute the values for loss,
        # new_w1, and new_w2; the values of these Tensors are returned as numpy
        # arrays.
        loss_value, _, _ = sess.run([loss, new_w1, new_w2],
                                    feed_dict={x: x_value, y: y_value})
        print(loss_value)

nn 模塊

PyTorch:nn

計算圖和 autograd 是定義復(fù)雜運算符和自動求導的的一個非常強大的范例腔剂。然而對于大規(guī)模的神經(jīng)網(wǎng)絡(luò)，原始的 autograd 可能有點太低級了。

當構(gòu)建神經(jīng)網(wǎng)絡(luò)時，我們經(jīng)常想到把計算組織維層級結(jié)構(gòu)廊遍，其中一些具有可學習的參數(shù)，這些參數(shù)將在學習期間被優(yōu)化见咒。

在 Tensorflow 中缤言，像 Keras， TensorFlow-Slim 和 TFLearn 這樣的軟件包提供了對原始圖的更高級的抽象视事，這對于構(gòu)建神經(jīng)網(wǎng)絡(luò)很有用胆萧。

在 PyTorch 中， nn 包提供了同樣的功能俐东。 nn 包提供了一組模塊跌穗，他們大致相當于神經(jīng)網(wǎng)絡(luò)層。一個模塊接收一個輸入變量并計算輸出向量虏辫，也可能保存內(nèi)部狀態(tài)蚌吸，如包含可學習的參數(shù)。 nn 包還定義了一組訓練神經(jīng)網(wǎng)絡(luò)時有用的損失函數(shù)砌庄。

在這個例子中我們使用 nn 包來實現(xiàn)我們的兩層網(wǎng)絡(luò)：

import torch
from torch.autograd import Variable

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)

# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Variables for its weight and bias.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out)
)

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(size_average=False)

learning_rate = 1e-4
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Variable of input data to the Module and it produces
    # a Variable of output data.
    y_pred = model(x)

    # Compute and print loss. We pass Variables containing the predicted and true
    # values of y, and the loss function returns a Variable containing the
    # loss.
    loss = loss_fn(y_pred, y)
    print(t, loss.data[0])

    # Zero the gradients before running the backward pass.
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Variables with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.

    loss.backward()

    for param in model.parameters():
        param.data -= learning_rate * param.grad.data

PyTorch： optim

到目前為止羹唠，我們已經(jīng)更新了我們模型的權(quán)重，通過手動改變需要學習的參數(shù)變量的 .data 成員娄昆。這對于像隨機梯度下降這種簡單的優(yōu)化算法來說不是很困難肉迫，但是在實際訓練神經(jīng)網(wǎng)絡(luò)時，我們常常使用更復(fù)雜的優(yōu)化算法稿黄，如 AdaGrad，RMSProp跌造， Adam等杆怕。

PyTorch 的 optim 包抽象了優(yōu)化算法的思想，并提供了常用算法的實現(xiàn)壳贪。

在這個例子中陵珍，我們講使用 nn 包來重新定義我們之前的模型，但是我們講使用 optim 包提供的 Adam 算法來優(yōu)化我們的模型：

import torch
from  torch.autograd import Variable

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)

# Use the nn package to define our model and loss function
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out)
)

loss_fn = torch.nn.MSELoss(size_average=False)

# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algoriths. The first argument to the Adam constructor tells the
# optimizer which Variables it should update.
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(x)

    # Compute and print loss
    loss = loss_fn(y_pred, y)
    print(t, loss.data[0])

    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables it will update (which are the learnable weights
    # of the model)
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to model
    # parameters
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its
    # parameters
    optimizer.step()

PyTorch：自定義模塊

有時你想要指定比現(xiàn)有模塊序列更復(fù)雜的模型违施；對于這種情況互纯，你可以通過繼承 nn.Module 來定義自己的模塊，并實現(xiàn) forward 函數(shù)磕蒲，他接收一個輸入變量留潦，并使用其他模塊或 autograd 操作符來生成輸出變量。

在這個例子中辣往，我們實現(xiàn)了一個兩層網(wǎng)絡(luò)來作為自定義模塊：

import torch
from torch.autograd import Variable

class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we instantiate two nn.Linear modules and assign them as member variables.
        """
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return
        a Variable of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Variables.
        """
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        return y_pred


# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs, and wrap them in Variables
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)

# Construct our model by instantiating the class defined above
model = TwoLayerNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
    # Forward pass: Compute predicted y by passing x to model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    print(t, loss.data[0])

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

PyTorch: 控制流 + 共享權(quán)重

作為動態(tài)圖和權(quán)值共享的例子兔院，我們實現(xiàn)了一個不同的模型：一個全連接的 ReLU 網(wǎng)絡(luò)，每次前向傳播時站削，從1到4隨機選擇一個數(shù)來作為隱藏層的層數(shù)坊萝。=，多次重復(fù)使用相同的權(quán)重來計算最內(nèi)層的隱藏層。

對于這個模型十偶，我們可以使用普通的 Python 控制流來實現(xiàn)循環(huán)菩鲜，在定義前向傳播時，通過簡單的重復(fù)使用同一個模塊惦积，我們可以實現(xiàn)最內(nèi)層之間的權(quán)重共享接校。

import random
import torch
from torch.autograd import Variable

class DynamicNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we construct three nn.Linear instances that we will use
        in the forward pass.
        """
        super(DynamicNet, self).__init__()
        self.input_linear = torch.nn.Linear(D_in, H)
        self.middle_linear = torch.nn.Linear(H, H)
        self.output_linear = torch.nn.Linear(H, D_out)
    
    def forward(self, x):
        """
        For the forward pass of the model, we randomly choose either 0, 1, 2, or 3
        and reuse the middle_linear Module that many times to compute hidden layer
        representations.

        Since each forward pass builds a dynamic computation graph, we can use normal
        Python control-flow operators like loops or conditional statements when
        defining the forward pass of the model.

        Here we also see that it is perfectly safe to reuse the same Module many
        times when defining a computational graph. This is a big improvement from Lua
        Torch, where each Module could be used only once.
        """

        h_relu = self.input_linear(x).clamp(min=0)
        for _ in range(random.randint(0, 3)):
            h_relu = self.middle_linear(h_relu).clamp(min=0)
        y_pred = self.output_linear(h_relu)
        return y_pred


# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs, and wrap them in Variables
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)

# Construct our model by instantiating the class defined above
model = DynamicNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
for t in range(500):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    print(t, loss.data[0])

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

例子

以上實例的代碼地址如下：

Tensors

Warm-up: numpy

PyTorch: Tensors

autograd
PyTorch: Variables and autograd

PyTorch: Defining new autograd functions

Tensorflow: Static Graphs

nn Module
PyTorch: nn

PyTorch: optim

PyTorch: Custom nn Modules

PyTorch: Control FLow + Weights Sharing

最后編輯于：2018.04.10 20:17:22

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市荣刑，隨后出現(xiàn)的幾起案子馅笙，更是在濱河造成了極大的恐慌，老刑警劉巖厉亏，帶你破解...
沈念sama閱讀 217,185評論 6贊 503
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件董习，死亡現(xiàn)場離奇詭異，居然都是意外死亡爱只，警方通過查閱死者的電腦和手機皿淋，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,652評論 3贊 393
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來恬试，“玉大人窝趣，你說我怎么就攤上這事⊙挡瘢” “怎么了哑舒？”我有些...
開封第一講書人閱讀 163,524評論 0贊 353
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長幻馁。經(jīng)常有香客問我洗鸵，道長，這世上最難降的妖魔是什么仗嗦？我笑而不...
開封第一講書人閱讀 58,339評論 1贊 293
?港島之戀（遺憾婚禮）
正文為了忘掉前任膘滨，我火速辦了婚禮，結(jié)果婚禮上稀拐，老公的妹妹穿的比我還像新娘火邓。我一直安慰自己，他們只是感情好德撬，可當我...
茶點故事閱讀 67,387評論 6贊 391
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布铲咨。她就那樣靜靜地躺著，像睡著了一般蜓洪。火紅的嫁衣襯著肌膚如雪鸣驱。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 51,287評論 1贊 301
城市分裂傳說
那天蝠咆，我揣著相機與錄音踊东，去河邊找鬼北滥。笑死，一個胖子當著我的面吹牛闸翅，可吹牛的內(nèi)容都是我干的再芋。我是一名探鬼主播，決...
沈念sama閱讀 40,130評論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼坚冀，長吁一口氣：“原來是場噩夢啊……” “哼济赎！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起记某，我...
開封第一講書人閱讀 38,985評論 0贊 275
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤司训，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后液南，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體壳猜，經(jīng)...
沈念sama閱讀 45,420評論 1贊 313
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 37,617評論 3贊 334
?白月光啟示錄
正文我和宋清朗相戀三年滑凉，在試婚紗的時候發(fā)現(xiàn)自己被綠了统扳。大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點故事閱讀 39,779評論 1贊 348
活死人
序言：一個原本活蹦亂跳的男人離奇死亡畅姊，死狀恐怖咒钟，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情若未，我是刑警寧澤朱嘴，帶...
沈念sama閱讀 35,477評論 5贊 345
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站粗合，受9級特大地震影響腕够，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜舌劳，卻給世界環(huán)境...
茶點故事閱讀 41,088評論 3贊 328
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望玫荣。院中可真熱鬧甚淡，春花似錦、人聲如沸捅厂。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,716評論 0贊 22
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽焙贷。三九已至撵割，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間辙芍，已是汗流浹背啡彬。一陣腳步聲響...
開封第一講書人閱讀 32,857評論 1贊 269
情欲美人皮
我被黑心中介騙來泰國打工羹与，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留，地道東北人庶灿。一個月前我還...
沈念sama閱讀 47,876評論 2贊 370
代替公主和親
正文我出身青樓纵搁，卻偏偏與公主長得像，于是被迫代替她去往敵國和親往踢。傳聞我的和親對象是個殘疾皇子腾誉，可洞房花燭夜當晚...
茶點故事閱讀 44,700評論 2贊 354

用例子學習 PyTorch

Tensors

熱身： numpy

PyTorch: Tensors

自動求導

PyTorch： Variables 和 autograd

PyTorch:定義新的 autograd 函數(shù)

Tensorflow: 靜態(tài)圖

nn 模塊

PyTorch:nn

PyTorch： optim

PyTorch： 自定義模塊

PyTorch: 控制流 + 共享權(quán)重

例子

推薦閱讀更多精彩內(nèi)容

PyTorch：自定義模塊