Pytorch Tutorial

1. TENSORS

Tensors是一種特殊的數(shù)據(jù)結(jié)構(gòu)爷抓,非常類似于數(shù)組和矩陣凛剥。在PyTorch中熔恢,我們使用Tensors來編碼模型的輸入和輸出磕秤,以及模型的參數(shù)乳乌。

Tensors類似于NumPy的 ndarray,除了Tensors可以在GPU或其他硬件加速器上運行市咆。事實上汉操,Tensors和NumPy數(shù)組通常可以共享相同的底層內(nèi)存蒙兰,從而消除了復(fù)制數(shù)據(jù)磷瘤。Tensors也被優(yōu)化為自動微分。如果您熟悉ndarrays搜变,那么您對Tensor API就很熟悉了采缚。

import torch
import numpy as np

Initializing a Tensor

Tensors can be initialized in various ways. Take a look at the following examples:

Directly from data

Tensors can be created directly from data. The data type is automatically inferred.

data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)

From a NumPy array

Tensors can be created from NumPy arrays.

np_array = np.array(data)
x_np = torch.from_numpy(np_array)

From another tensor:

The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden.

x_ones = torch.ones_like(x_data) # retains the properties of x_data
print(f"Ones Tensor: \n {x_ones} \n")

x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data
print(f"Random Tensor: \n {x_rand} \n")

Out:

Ones Tensor:
 tensor([[1, 1],
        [1, 1]])

Random Tensor:
 tensor([[0.9802, 0.0761],
        [0.8980, 0.4541]])

With random or constant values:

shape is a tuple of tensor dimensions. In the functions below, it determines the dimensionality of the output tensor.

shape = (2,3,)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")

Out:

Random Tensor:
 tensor([[0.1383, 0.0385, 0.0745],
        [0.1842, 0.2020, 0.1991]])

Ones Tensor:
 tensor([[1., 1., 1.],
        [1., 1., 1.]])

Zeros Tensor:
 tensor([[0., 0., 0.],
        [0., 0., 0.]])

Attributes of a Tensor

Tensor attributes describe their shape, datatype, and the device on which they are stored.

tensor = torch.rand(3,4)

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

Out:

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu

Operations on Tensors

超過100個tensor 操作,包括算術(shù)挠他,線性代數(shù)扳抽,矩陣操作(轉(zhuǎn)置,索引殖侵,切片)贸呢,采樣和更多的全面描述這里

這些操作都可以在GPU上運行(速度通常高于CPU)拢军。如果你使用的是Colab楞陷,可以在Runtime >更改運行時類型>GPU。

默認情況下茉唉,張量是在CPU上創(chuàng)建的固蛾。我們需要明確地將張量移動到GPU中。方法(在檢查GPU可用性之后)赌渣。請記住魏铅,跨設(shè)備復(fù)制大型張量在時間和內(nèi)存方面是昂貴的!

# We move our tensor to the GPU if available
if torch.cuda.is_available():
  tensor = tensor.to('cuda')

Try out some of the operations from the list. If you’re familiar with the NumPy API, you’ll find the Tensor API a breeze to use.

Standard numpy-like indexing and slicing:

tensor = torch.ones(4, 4)
print('First row: ',tensor[0])
print('First column: ', tensor[:, 0])
print('Last column:', tensor[..., -1])
tensor[:,1] = 0
print(tensor)

Out:

First row:  tensor([1., 1., 1., 1.])
First column:  tensor([1., 1., 1., 1.])
Last column: tensor([1., 1., 1., 1.])
tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])

Joining tensors You can use torch.cat to concatenate a sequence of tensors along a given dimension. See also torch.stack, another tensor joining op that is subtly different from torch.cat.

# 4*4合并
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)

Out:

tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]])

Arithmetic operations

# This computes the matrix multiplication between two tensors. y1, y2, y3 will have the same value
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)

y3 = torch.rand_like(tensor)
torch.matmul(tensor, tensor.T, out=y3)


# This computes the element-wise product. z1, z2, z3 will have the same value
z1 = tensor * tensor
z2 = tensor.mul(tensor)

z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)

Single-element tensors如果你有一個one-element tensor,例如通過將一個tensor 的所有值聚合為一個值坚芜,你可以使用' item() '將它轉(zhuǎn)換為一個Python數(shù)值

agg = tensor.sum()
agg_item = agg.item()
print(agg_item, type(agg_item))

Out:

12.0 <class 'float'>

In-place operations Operations that store the result into the operand are called in-place. They are denoted by a _ suffix. For example: x.copy_(y), x.t_(), will change x.

print(tensor, "\n")
tensor.add_(5)
print(tensor)

Out:

tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])

tensor([[6., 5., 6., 6.],
        [6., 5., 6., 6.],
        [6., 5., 6., 6.],
        [6., 5., 6., 6.]])

NOTE

In-place operations save some memory, but can be problematic when computing derivatives because of an immediate loss of history. Hence, their use is discouraged.

Bridge with NumPy

Tensors on the CPU and NumPy arrays can share their underlying memory locations, and changing one will change the other.

Tensor to NumPy array

t = torch.ones(5)
print(f"t: {t}")
n = t.numpy()
print(f"n: {n}")

Out:

t: tensor([1., 1., 1., 1., 1.])
n: [1. 1. 1. 1. 1.]

A change in the tensor reflects in the NumPy array.

t.add_(1)
print(f"t: {t}")
print(f"n: {n}")

Out:

t: tensor([2., 2., 2., 2., 2.])
n: [2. 2. 2. 2. 2.]

NumPy array to Tensor

n = np.ones(5)
t = torch.from_numpy(n)

Changes in the NumPy array reflects in the tensor.

np.add(n, 1, out=n)
print(f"t: {t}")
print(f"n: {n}")

Out:

t: tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
n: [2. 2. 2. 2. 2.]

Total running time of the script: ( 0 minutes 6.199 seconds)

2. DATASETS & DATALOADERS

處理數(shù)據(jù)樣本的代碼可能會變得混亂且難以維護;理想情況下览芳,我們希望數(shù)據(jù)集代碼與模型訓(xùn)練代碼解耦,以獲得更好的可讀性和模塊化鸿竖。PyTorch提供了兩個數(shù)據(jù)原語:torch.utils.data.DataLoader and torch.utils.data.Dataset沧竟。允許您使用預(yù)加載的數(shù)據(jù)集以及您自己的數(shù)據(jù)。 Dataset存儲示例及其相應(yīng)的標簽缚忧,DataLoaderDataset周圍包裝一個可迭代對象以方便訪問示例悟泵。

PyTorch域庫提供了許多預(yù)加載的數(shù)據(jù)集(比如FashionMNIST),這些數(shù)據(jù)集的子類的torch.utils.data.Dataset 闪水,并實現(xiàn)特定于特定數(shù)據(jù)的函數(shù)糕非。它們可以用于模型原型和基準測試。你可以在這里找到它們: Image Datasets, Text Datasets, and Audio Datasets

Loading a Dataset

Here is an example of how to load the Fashion-MNIST dataset from TorchVision. Fashion-MNIST is a dataset of Zalando’s article images consisting of 60,000 training examples and 10,000 test examples. Each example comprises a 28×28 grayscale image and an associated label from one of 10 classes.

We load the FashionMNIST Dataset with the following parameters:

  • root is the path where the train/test data is stored,
  • train specifies training or test dataset,
  • download=True downloads the data from the internet if it’s not available at root.
  • transform and target_transform specify the feature and label transformations
import torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt


training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

Iterating and Visualizing the Dataset

We can index Datasets manually like a list: training_data[index]. We use matplotlib to visualize some samples in our training data.

labels_map = {
    0: "T-Shirt",
    1: "Trouser",
    2: "Pullover",
    3: "Dress",
    4: "Coat",
    5: "Sandal",
    6: "Shirt",
    7: "Sneaker",
    8: "Bag",
    9: "Ankle Boot",
}
figure = plt.figure(figsize=(8, 8))
cols, rows = 3, 3
for i in range(1, cols * rows + 1):
    sample_idx = torch.randint(len(training_data), size=(1,)).item()
    img, label = training_data[sample_idx]
    figure.add_subplot(rows, cols, i)
    plt.title(labels_map[label])
    plt.axis("off")
    plt.imshow(img.squeeze(), cmap="gray")
plt.show()

Creating a Custom Dataset for your files

A custom Dataset class must implement three functions: init, len, and getitem. Take a look at this implementation; the FashionMNIST images are stored in a directory img_dir, and their labels are stored separately in a CSV file annotations_file.

In the next sections, we’ll break down what’s happening in each of these functions.

import os
import pandas as pd
from torchvision.io import read_image

class CustomImageDataset(Dataset):
    def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
        self.img_labels = pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.target_transform = target_transform

    def __len__(self):
        return len(self.img_labels)

    def __getitem__(self, idx):
        img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)
        return image, label

_init_

The init function is run once when instantiating the Dataset object. We initialize the directory containing the images, the annotations file, and both transforms (covered in more detail in the next section).

The labels.csv file looks like:

tshirt1.jpg, 0
tshirt2.jpg, 0
......
ankleboot999.jpg, 9
def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
    self.img_labels = pd.read_csv(annotations_file)
    self.img_dir = img_dir
    self.transform = transform
    self.target_transform = target_transform

_len_

The len function returns the number of samples in our dataset.

Example:

def __len__(self):
    return len(self.img_labels)

_getitem_

The getitem function loads and returns a sample from the dataset at the given index idx. Based on the index, it identifies the image’s location on disk, converts that to a tensor using read_image, retrieves the corresponding label from the csv data in self.img_labels, calls the transform functions on them (if applicable), and returns the tensor image and corresponding label in a tuple.

def __getitem__(self, idx):
    img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
    image = read_image(img_path)
    label = self.img_labels.iloc[idx, 1]
    if self.transform:
        image = self.transform(image)
    if self.target_transform:
        label = self.target_transform(label)
    return image, label

Preparing your data for training with DataLoaders

The Dataset retrieves our dataset’s features and labels one sample at a time. While training a model, we typically want to pass samples in “minibatches”, reshuffle the data at every epoch to reduce model overfitting, and use Python’s multiprocessing to speed up data retrieval.

DataLoader is an iterable that abstracts this complexity for us in an easy API.

from torch.utils.data import DataLoader

train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)

Iterate through the DataLoader

We have loaded that dataset into the Dataloader and can iterate through the dataset as needed. Each iteration below returns a batch of train_features and train_labels (containing batch_size=64 features and labels respectively). Because we specified shuffle=True, after we iterate over all batches the data is shuffled (for finer-grained control over the data loading order, take a look at Samplers).

# Display image and label.
train_features, train_labels = next(iter(train_dataloader))
print(f"Feature batch shape: {train_features.size()}")
print(f"Labels batch shape: {train_labels.size()}")
img = train_features[0].squeeze()
label = train_labels[0]
plt.imshow(img, cmap="gray")
plt.show()
print(f"Label: {label}")

Out:

Feature batch shape: torch.Size([64, 1, 28, 28])
Labels batch shape: torch.Size([64])
Label: 0

3. TRANSFORMS

數(shù)據(jù)并不總是以訓(xùn)練機器學(xué)習(xí)算法所需的最終處理形式出現(xiàn)。我們使用transforms來執(zhí)行一些數(shù)據(jù)操作朽肥,使其適合于訓(xùn)練禁筏。

所有TorchVision數(shù)據(jù)集都有兩個參數(shù)—用于修改特性的transform和用于修改標簽的target_transform—它們接受包含轉(zhuǎn)換邏輯的可調(diào)用對象。torchvision.transforms模塊提供了幾種常用的開箱即用的轉(zhuǎn)換衡招。

The FashionMNIST features are in PIL Image format, and the labels are integers. For training, we need the features as normalized tensors, and the labels as one-hot encoded tensors. To make these transformations, we use ToTensor and Lambda.

from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

ds = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
    target_transform=Lambda(lambda y: torch.zeros(10, dtype=torch.float).scatter_(0, torch.tensor(y), value=1))
)

ToTensor()

ToTensor converts a PIL image or NumPy ndarray into a FloatTensor. and scales the image’s pixel intensity values in the range [0., 1.]

Lambda Transforms

Lambda transforms apply any user-defined lambda function. Here, we define a function to turn the integer into a one-hot encoded tensor. It first creates a zero tensor of size 10 (the number of labels in our dataset) and calls scatter_ which assigns a value=1 on the index as given by the label y.

target_transform = Lambda(lambda y: torch.zeros(
    10, dtype=torch.float).scatter_(dim=0, index=torch.tensor(y), value=1))

4. BUILD THE NEURAL NETWORK

Neural networks comprise of layers/modules that perform operations on data. The torch.nn namespace provides all the building blocks you need to build your own neural network. Every module in PyTorch subclasses the nn.Module. A neural network is a module itself that consists of other modules (layers). This nested structure allows for building and managing complex architectures easily.

In the following sections, we’ll build a neural network to classify images in the FashionMNIST dataset.

神經(jīng)網(wǎng)絡(luò)由對數(shù)據(jù)執(zhí)行操作的層/模塊組成篱昔。該torch.nn命名空間提供了所有你需要建立自己的神經(jīng)網(wǎng)絡(luò)的基石。在PyTorch每個模塊的子類nn.Module始腾。神經(jīng)網(wǎng)絡(luò)是一個模塊本身州刽,由其他模塊(層)組成。這種嵌套結(jié)構(gòu)允許輕松構(gòu)建和管理復(fù)雜的架構(gòu)浪箭。

在以下部分中穗椅,我們將構(gòu)建一個神經(jīng)網(wǎng)絡(luò)來對 FashionMNIST 數(shù)據(jù)集中的圖像進行分類。

import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

Get Device for Training

We want to be able to train our model on a hardware accelerator like the GPU, if it is available. Let’s check to see if torch.cuda is available, else we continue to use the CPU.

我們希望能夠在 GPU 等硬件加速器上訓(xùn)練我們的模型(如果可用)山林。讓我們檢查一下 torch.cuda是否可用房待,否則我們繼續(xù)使用CPU邢羔。

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Using {} device'.format(device))

Out:

Using cuda device

Define the Class

We define our neural network by subclassing nn.Module, and initialize the neural network layers in __init__. Every nn.Module subclass implements the operations on input data in the forward method.

我們通過子類化定義我們的神經(jīng)網(wǎng)絡(luò)nn.Module驼抹,并在 中初始化神經(jīng)網(wǎng)絡(luò)層__init__。每個nn.Module子類都在forward方法中實現(xiàn)對輸入數(shù)據(jù)的操作拜鹤。

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )

    def forward(self, x):
        # 一維展開
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

We create an instance of NeuralNetwork, and move it to the device, and print its structure.

model = NeuralNetwork().to(device)
print(model)

Out:

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): ReLU()
  )
)

To use the model, we pass it the input data. This executes the model’s forward, along with some background operations. Do not call model.forward() directly!

Calling the model on the input returns a 10-dimensional tensor with raw predicted values for each class. We get the prediction probabilities by passing it through an instance of the nn.Softmax module.

為了使用模型框冀,我們將輸入數(shù)據(jù)傳遞給它。這將執(zhí)行模型的forward以及一些后臺操作敏簿。不要直接調(diào)用model.forward()

在輸入上調(diào)用模型會返回一個 10 維張量明也,其中包含每個類的原始預(yù)測值。我們通過將其傳遞給nn.Softmax模塊的實例來獲得預(yù)測概率惯裕。

X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Out:

Predicted class: tensor([2], device='cuda:0')

Model Layers

Let’s break down the layers in the FashionMNIST model. To illustrate it, we will take a sample minibatch of 3 images of size 28x28 and see what happens to it as we pass it through the network.

input_image = torch.rand(3,28,28)
print(input_image.size())

Out:

torch.Size([3, 28, 28])

nn.Flatten

We initialize the nn.Flatten layer to convert each 2D 28x28 image into a contiguous array of 784 pixel values ( the minibatch dimension (at dim=0) is maintained).

flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

Out:

torch.Size([3, 784])

nn.Linear

The linear layer is a module that applies a linear transformation on the input using its stored weights and biases.

layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

Out:

torch.Size([3, 20])

nn.ReLU

Non-linear activations are what create the complex mappings between the model’s inputs and outputs. They are applied after linear transformations to introduce nonlinearity, helping neural networks learn a wide variety of phenomena.

In this model, we use nn.ReLU between our linear layers, but there’s other activations to introduce non-linearity in your model.

print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

nn.Sequential

nn.Sequential is an ordered container of modules. The data is passed through all the modules in the same order as defined. You can use sequential containers to put together a quick network like seq_modules.

seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)

nn.Softmax

The last linear layer of the neural network returns logits - raw values in [-infty, infty] - which are passed to the nn.Softmax module. The logits are scaled to values [0, 1] representing the model’s predicted probabilities for each class. dim parameter indicates the dimension along which the values must sum to 1.

softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

Model Parameters

Many layers inside a neural network are parameterized, i.e. have associated weights and biases that are optimized during training. Subclassing nn.Module automatically tracks all fields defined inside your model object, and makes all parameters accessible using your model’s parameters() or named_parameters() methods.

In this example, we iterate over each parameter, and print its size and a preview of its values.

print("Model structure: ", model, "\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Out:

Model structure:  NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): ReLU()
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[-0.0153, -0.0177,  0.0286,  ..., -0.0073, -0.0272,  0.0314],
        [ 0.0067, -0.0347,  0.0343,  ...,  0.0347, -0.0196, -0.0094]],
       device='cuda:0', grad_fn=<SliceBackward>)

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([ 0.0118, -0.0279], device='cuda:0', grad_fn=<SliceBackward>)

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[-0.0319, -0.0182, -0.0130,  ..., -0.0155, -0.0372, -0.0199],
        [-0.0051,  0.0356,  0.0397,  ...,  0.0419, -0.0151,  0.0283]],
       device='cuda:0', grad_fn=<SliceBackward>)

Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) | Values : tensor([-0.0041, -0.0237], device='cuda:0', grad_fn=<SliceBackward>)

Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) | Values : tensor([[-0.0028,  0.0333,  0.0018,  ..., -0.0088, -0.0022, -0.0389],
        [-0.0091,  0.0066, -0.0125,  ..., -0.0255,  0.0282,  0.0056]],
       device='cuda:0', grad_fn=<SliceBackward>)

Layer: linear_relu_stack.4.bias | Size: torch.Size([10]) | Values : tensor([0.0201, 0.0236], device='cuda:0', grad_fn=<SliceBackward>)

5. AUTOMATIC DIFFERENTIATION

When training neural networks, the most frequently used algorithm is back propagation. In this algorithm, parameters (model weights) are adjusted according to the gradient of the loss function with respect to the given parameter.

To compute those gradients, PyTorch has a built-in differentiation engine called torch.autograd. It supports automatic computation of gradient for any computational graph.

Consider the simplest one-layer neural network, with input x, parameters w and b, and some loss function. It can be defined in PyTorch in the following manner:

在訓(xùn)練神經(jīng)網(wǎng)絡(luò)時温数,最常用的算法是 反向傳播。在該算法中蜻势,參數(shù)(模型權(quán)重)根據(jù)損失函數(shù)相對于給定參數(shù)的梯度進行調(diào)整撑刺。為了計算這些梯度,PyTorch 有一個名為 的內(nèi)置微分引擎torch.autograd握玛。它支持任何計算圖的梯度自動計算够傍。

考慮最簡單的一層神經(jīng)網(wǎng)絡(luò),具有輸入x挠铲、參數(shù)wb冕屯,以及一些損失函數(shù)。它可以通過以下方式在 PyTorch 中定義:

import torch

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

A function that we apply to tensors to construct computational graph is in fact an object of class Function. This object knows how to compute the function in the forward direction, and also how to compute its derivative during the backward propagation step. A reference to the backward propagation function is stored in grad_fn property of a tensor. You can find more information of Function in the documentation.

print('Gradient function for z =',z.grad_fn)
print('Gradient function for loss =', loss.grad_fn)

Out:

Gradient function for z = <AddBackward0 object at 0x7f34c97b5588>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward object at 0x7f34c97b5588>

Computing Gradients

To optimize weights of parameters in the neural network, we need to compute the derivatives of our loss function with respect to parameters, namely, we need \frac{\partial loss}{\partial w} and \frac{\partial loss}{\partial b} under some fixed values of x and y. To compute those derivatives, we call loss.backward(), and then retrieve the values from w.grad and b.grad:

loss.backward()
print(w.grad)
print(b.grad)

Out:

tensor([[0.0540, 0.3248, 0.3158],
        [0.0540, 0.3248, 0.3158],
        [0.0540, 0.3248, 0.3158],
        [0.0540, 0.3248, 0.3158],
        [0.0540, 0.3248, 0.3158]])
tensor([0.0540, 0.3248, 0.3158])

Disabling Gradient Tracking

By default, all tensors with requires_grad=True are tracking their computational history and support gradient computation. However, there are some cases when we do not need to do that, for example, when we have trained the model and just want to apply it to some input data, i.e. we only want to do forward computations through the network. We can stop tracking computations by surrounding our computation code with torch.no_grad() block:

默認情況下拂苹,所有具有 的張量requires_grad=True都在跟蹤它們的計算歷史并支持梯度計算安聘。但是,有些情況下我們不需要這樣做,例如浴韭,當(dāng)我們訓(xùn)練了模型并且只想將其應(yīng)用于某些輸入數(shù)據(jù)時带迟,即我們只想通過網(wǎng)絡(luò)進行前向計算。我們可以通過用torch.no_grad()塊包圍我們的計算代碼來停止跟蹤計算 :

z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)

Out:

True
False

Another way to achieve the same result is to use the detach() method on the tensor:

實現(xiàn)相同結(jié)果的另一種detach()方法是在張量上使用該方法:

z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

Out:

False
  • There are reasons you might want to disable gradient tracking:

    To mark some parameters in your neural network at frozen parameters. This is a very common scenario for finetuning a pretrained networkTo speed up computations when you are only doing forward pass, because computations on tensors that do not track gradients would be more efficient.

您可能想要禁用梯度跟蹤的原因有:

  • 將神經(jīng)網(wǎng)絡(luò)中的某些參數(shù)標記為凍結(jié)參數(shù)囱桨。這是微調(diào)預(yù)訓(xùn)練網(wǎng)絡(luò)的一個非常常見的場景
  • 在僅進行前向傳遞時加快計算速度仓犬,因為對不跟蹤梯度的張量進行計算會更有效。

6. OPTIMIZING MODEL PARAMETERS

Now that we have a model and data it’s time to train, validate and test our model by optimizing its parameters on our data. Training a model is an iterative process; in each iteration (called an epoch) the model makes a guess about the output, calculates the error in its guess (loss), collects the derivatives of the error with respect to its parameters (as we saw in the previous section), and optimizes these parameters using gradient descent. For a more detailed walkthrough of this process, check out this video on backpropagation from 3Blue1Brown.

Prerequisite Code

We load the code from the previous sections on Datasets & DataLoaders and Build Model.

import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.linear_relu_stack = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )

    def forward(self, x):
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork()

Hyperparameters

Hyperparameters are adjustable parameters that let you control the model optimization process. Different hyperparameter values can impact model training and convergence rates (read more about hyperparameter tuning)

We define the following hyperparameters for training:

  • Number of Epochs - the number times to iterate over the dataset

  • Batch Size - the number of data samples propagated through the network before the parameters are updated

  • Learning Rate - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training.

超參數(shù)是可調(diào)節(jié)的參數(shù)舍肠,可讓您控制模型優(yōu)化過程搀继。不同的超參數(shù)值會影響模型訓(xùn)練和收斂速度(閱讀有關(guān)超參數(shù)調(diào)整的更多信息)

我們?yōu)橛?xùn)練定義了以下超參數(shù):

Number of Epochs - 迭代數(shù)據(jù)集的次數(shù)

Batch Size - 在更新參數(shù)之前通過網(wǎng)絡(luò)傳播的數(shù)據(jù)樣本數(shù)量

學(xué)習(xí)率- 在每個批次/時期更新模型參數(shù)的程度。較小的值會導(dǎo)致學(xué)習(xí)速度變慢翠语,而較大的值可能會導(dǎo)致訓(xùn)練過程中出現(xiàn)不可預(yù)測的行為叽躯。

learning_rate = 1e-3
batch_size = 64
epochs = 5

Optimization Loop

Once we set our hyperparameters, we can then train and optimize our model with an optimization loop. Each iteration of the optimization loop is called an epoch.

Each epoch consists of two main parts:

  • The Train Loop - iterate over the training dataset and try to converge to optimal parameters.

  • The Validation/Test Loop - iterate over the test dataset to check if model performance is improving.

Let’s briefly familiarize ourselves with some of the concepts used in the training loop. Jump ahead to see the Full Implementation of the optimization loop.一旦我們設(shè)置了我們的超參數(shù),我們就可以使用優(yōu)化循環(huán)來訓(xùn)練和優(yōu)化我們的模型肌括。優(yōu)化循環(huán)的每次迭代稱為一個epoch躏精。

每個時代由兩個主要部分組成:

  • 訓(xùn)練循環(huán)- 迭代訓(xùn)練數(shù)據(jù)集并嘗試收斂到最佳參數(shù)。

  • 驗證/測試循環(huán)- 迭代測試數(shù)據(jù)集以檢查模型性能是否正在提高诅炉。

讓我們簡要地熟悉一下訓(xùn)練循環(huán)中使用的一些概念罢荡。跳轉(zhuǎn)到查看優(yōu)化循環(huán)的完整實現(xiàn)

Loss Function

When presented with some training data, our untrained network is likely not to give the correct answer. Loss function measures the degree of dissimilarity of obtained result to the target value, and it is the loss function that we want to minimize during training. To calculate the loss we make a prediction using the inputs of our given data sample and compare it against the true data label value.

Common loss functions include nn.MSELoss (Mean Square Error) for regression tasks, and nn.NLLLoss (Negative Log Likelihood) for classification. nn.CrossEntropyLoss combines nn.LogSoftmax and nn.NLLLoss.

We pass our model’s output logits to nn.CrossEntropyLoss, which will normalize the logits and compute the prediction error.

當(dāng)提供一些訓(xùn)練數(shù)據(jù)時紧索,我們未經(jīng)訓(xùn)練的網(wǎng)絡(luò)可能不會給出正確的答案袁辈。損失函數(shù)衡量得到的結(jié)果與目標值的不相似程度,是我們在訓(xùn)練過程中想要最小化的損失函數(shù)珠漂。為了計算損失晚缩,我們使用給定數(shù)據(jù)樣本的輸入進行預(yù)測,并將其與真實數(shù)據(jù)標簽值進行比較媳危。

常見的損失函數(shù)包括用于回歸任務(wù)的nn.MSELoss(均方誤差)和 用于分類的nn.NLLLoss(負對數(shù)似然)荞彼。 nn.CrossEntropyLoss結(jié)合nn.LogSoftmaxnn.NLLLoss

我們將模型的輸出 logits 傳遞給nn.CrossEntropyLoss待笑,這將標準化 logits 并計算預(yù)測誤差鸣皂。

# Initialize the loss function
loss_fn = nn.CrossEntropyLoss()

Optimizer

Optimization is the process of adjusting model parameters to reduce model error in each training step. Optimization algorithms define how this process is performed (in this example we use Stochastic Gradient Descent). All optimization logic is encapsulated in the optimizer object. Here, we use the SGD optimizer; additionally, there are many different optimizers available in PyTorch such as ADAM and RMSProp, that work better for different kinds of models and data.

We initialize the optimizer by registering the model’s parameters that need to be trained, and passing in the learning rate hyperparameter.

優(yōu)化是在每個訓(xùn)練步驟中調(diào)整模型參數(shù)以減少模型誤差的過程。優(yōu)化算法定義了這個過程是如何執(zhí)行的(在這個例子中我們使用隨機梯度下降)滋觉。所有優(yōu)化邏輯都封裝在optimizer對象中签夭。在這里,我們使用 SGD 優(yōu)化器椎侠;此外第租, PyTorch 中有許多不同的優(yōu)化器可用,例如 ADAM 和 RMSProp我纪,它們更適用于不同類型的模型和數(shù)據(jù)慎宾。

我們通過注冊模型需要訓(xùn)練的參數(shù)并傳入學(xué)習(xí)率超參數(shù)來初始化優(yōu)化器丐吓。

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Inside the training loop, optimization happens in three steps:

  • Call optimizer.zero_grad() to reset the gradients of model parameters. Gradients by default add up; to prevent double-counting, we explicitly zero them at each iteration.
  • Backpropagate the prediction loss with a call to loss.backwards(). PyTorch deposits the gradients of the loss w.r.t. each parameter.
  • Once we have our gradients, we call optimizer.step() to adjust the parameters by the gradients collected in the backward pass.

在訓(xùn)練循環(huán)中,優(yōu)化分三步進行:

  • 調(diào)用optimizer.zero_grad()以重置模型參數(shù)的梯度趟据。默認情況下漸變相加券犁;為了防止重復(fù)計算,我們在每次迭代時明確地將它們歸零汹碱。
  • 調(diào)用 來反向傳播預(yù)測損失loss.backwards()粘衬。PyTorch 存儲了每個參數(shù)的損失梯度。
  • 一旦我們有了梯度咳促,我們就調(diào)用optimizer.step()在向后傳遞中收集的梯度來調(diào)整參數(shù)稚新。

Full Implementation

We define train_loop that loops over our optimization code, and test_loop that evaluates the model’s performance against our test data.

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        # 調(diào)用optimizer.zero_grad()以重置模型參數(shù)的梯度
        optimizer.zero_grad()
        # 調(diào)用 =loss.backwards()來反向傳播預(yù)測損失
        loss.backward()
        # 一旦我們有了梯度,我們就調(diào)用optimizer.step()在向后傳遞中收集的梯度來調(diào)整參數(shù)
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

We initialize the loss function and optimizer, and pass it to train_loop and test_loop. Feel free to increase the number of epochs to track the model’s improving performance.

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

7. SAVE AND LOAD THE MODEL

In this section we will look at how to persist model state with saving, loading and running model predictions.

import torch
import torch.onnx as onnx
import torchvision.models as models

Saving and Loading Model Weights

PyTorch models store the learned parameters in an internal state dictionary, called state_dict. These can be persisted via the torch.save method:

PyTorch 模型將學(xué)習(xí)到的參數(shù)存儲在一個名為 的內(nèi)部狀態(tài)字典中state_dict跪腹。這些可以通過以下torch.save 方法持久化:

model = models.vgg16(pretrained=True)
torch.save(model.state_dict(), 'model_weights.pth')

To load model weights, you need to create an instance of the same model first, and then load the parameters using load_state_dict() method.

要加載模型權(quán)重褂删,您需要先創(chuàng)建相同模型的實例,然后使用load_state_dict()方法加載參數(shù)冲茸。

model = models.vgg16() # we do not specify pretrained=True, i.e. do not load default weights
model.load_state_dict(torch.load('model_weights.pth'))
model.eval()

be sure to call model.eval() method before inferencing to set the dropout and batch normalization layers to evaluation mode. Failing to do this will yield inconsistent inference results.

一定要調(diào)用model.eval()在推理之前屯阀,以將 dropout 和批量歸一化層設(shè)置為評估模式。不這樣做會產(chǎn)生不一致的推理結(jié)果轴术。

Saving and Loading Models with Shapes

When loading model weights, we needed to instantiate the model class first, because the class defines the structure of a network. We might want to save the structure of this class together with the model, in which case we can pass model (and not model.state_dict()) to the saving function:

在加載模型權(quán)重時难衰,我們需要先實例化模型類,因為該類定義了網(wǎng)絡(luò)的結(jié)構(gòu)膳音。我們可能希望將此類的結(jié)構(gòu)與模型一起保存召衔,在這種情況下,我們可以將model(而不是model.state_dict())傳遞給保存函數(shù):

torch.save(model, 'model.pth')

We can then load the model like this:

model = torch.load('model.pth')

This approach uses Python pickle module when serializing the model, thus it relies on the actual class definition to be available when loading the model.

這種方法在序列化模型時使用 Python pickle模塊祭陷,因此它依賴于加載模型時可用的實際類定義。

Exporting Model to ONNX

PyTorch also has native ONNX export support. Given the dynamic nature of the PyTorch execution graph, however, the export process must traverse the execution graph to produce a persisted ONNX model. For this reason, a test variable of the appropriate size should be passed in to the export routine (in our case, we will create a dummy zero tensor of the correct size):

PyTorch 還具有本機 ONNX 導(dǎo)出支持趣席。然而兵志,鑒于 PyTorch 執(zhí)行圖的動態(tài)特性,導(dǎo)出過程必須遍歷執(zhí)行圖以生成持久化的 ONNX 模型宣肚。出于這個原因想罕,應(yīng)該將適當(dāng)大小的測試變量傳遞給導(dǎo)出例程(在我們的例子中,我們將創(chuàng)建一個正確大小的虛擬零張量):

input_image = torch.zeros((1,3,224,224))
onnx.export(model, input_image, 'model.onnx')

There are a lot of things you can do with ONNX model, including running inference on different platforms and in different programming languages. For more details, we recommend visiting ONNX tutorial.

Congratulations! You have completed the PyTorch beginner tutorial! Try revisting the first page to see the tutorial in its entirety again. We hope this tutorial has helped you get started with deep learning on PyTorch. Good luck!

您可以使用 ONNX 模型做很多事情霉涨,包括在不同平臺和不同編程語言上運行推理按价。有關(guān)更多詳細信息,我們建議訪問ONNX 教程笙瑟。

恭喜楼镐!您已完成 PyTorch 初學(xué)者教程!嘗試 重新查看第一頁以再次查看整個教程往枷。我們希望本教程能幫助您開始在 PyTorch 上進行深度學(xué)習(xí)框产。祝你好運凄杯!

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市秉宿,隨后出現(xiàn)的幾起案子戒突,更是在濱河造成了極大的恐慌,老刑警劉巖描睦,帶你破解...
    沈念sama閱讀 218,682評論 6 507
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件膊存,死亡現(xiàn)場離奇詭異,居然都是意外死亡忱叭,警方通過查閱死者的電腦和手機膝舅,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,277評論 3 395
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來窑多,“玉大人仍稀,你說我怎么就攤上這事」∠ⅲ” “怎么了技潘?”我有些...
    開封第一講書人閱讀 165,083評論 0 355
  • 文/不壞的土叔 我叫張陵,是天一觀的道長千康。 經(jīng)常有香客問我享幽,道長,這世上最難降的妖魔是什么拾弃? 我笑而不...
    開封第一講書人閱讀 58,763評論 1 295
  • 正文 為了忘掉前任值桩,我火速辦了婚禮,結(jié)果婚禮上豪椿,老公的妹妹穿的比我還像新娘奔坟。我一直安慰自己,他們只是感情好搭盾,可當(dāng)我...
    茶點故事閱讀 67,785評論 6 392
  • 文/花漫 我一把揭開白布咳秉。 她就那樣靜靜地躺著,像睡著了一般鸯隅。 火紅的嫁衣襯著肌膚如雪澜建。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 51,624評論 1 305
  • 那天蝌以,我揣著相機與錄音炕舵,去河邊找鬼。 笑死跟畅,一個胖子當(dāng)著我的面吹牛咽筋,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播碍彭,決...
    沈念sama閱讀 40,358評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼晤硕,長吁一口氣:“原來是場噩夢啊……” “哼悼潭!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起舞箍,我...
    開封第一講書人閱讀 39,261評論 0 276
  • 序言:老撾萬榮一對情侶失蹤舰褪,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后疏橄,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體占拍,經(jīng)...
    沈念sama閱讀 45,722評論 1 315
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,900評論 3 336
  • 正文 我和宋清朗相戀三年捎迫,在試婚紗的時候發(fā)現(xiàn)自己被綠了晃酒。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 40,030評論 1 350
  • 序言:一個原本活蹦亂跳的男人離奇死亡窄绒,死狀恐怖贝次,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情彰导,我是刑警寧澤蛔翅,帶...
    沈念sama閱讀 35,737評論 5 346
  • 正文 年R本政府宣布,位于F島的核電站位谋,受9級特大地震影響山析,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜掏父,卻給世界環(huán)境...
    茶點故事閱讀 41,360評論 3 330
  • 文/蒙蒙 一笋轨、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧赊淑,春花似錦爵政、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,941評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至组哩,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間处渣,已是汗流浹背伶贰。 一陣腳步聲響...
    開封第一講書人閱讀 33,057評論 1 270
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留罐栈,地道東北人黍衙。 一個月前我還...
    沈念sama閱讀 48,237評論 3 371
  • 正文 我出身青樓,卻偏偏與公主長得像荠诬,于是被迫代替她去往敵國和親琅翻。 傳聞我的和親對象是個殘疾皇子位仁,可洞房花燭夜當(dāng)晚...
    茶點故事閱讀 44,976評論 2 355

推薦閱讀更多精彩內(nèi)容