1. TENSORS
Tensors是一種特殊的數(shù)據(jù)結(jié)構(gòu)爷抓,非常類似于數(shù)組和矩陣凛剥。在PyTorch中熔恢,我們使用Tensors來編碼模型的輸入和輸出磕秤,以及模型的參數(shù)乳乌。
Tensors類似于NumPy的 ndarray,除了Tensors可以在GPU或其他硬件加速器上運行市咆。事實上汉操,Tensors和NumPy數(shù)組通常可以共享相同的底層內(nèi)存蒙兰,從而消除了復(fù)制數(shù)據(jù)磷瘤。Tensors也被優(yōu)化為自動微分。如果您熟悉ndarrays搜变,那么您對Tensor API就很熟悉了采缚。
import torch
import numpy as np
Initializing a Tensor
Tensors can be initialized in various ways. Take a look at the following examples:
Directly from data
Tensors can be created directly from data. The data type is automatically inferred.
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)
From a NumPy array
Tensors can be created from NumPy arrays.
np_array = np.array(data)
x_np = torch.from_numpy(np_array)
From another tensor:
The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden.
x_ones = torch.ones_like(x_data) # retains the properties of x_data
print(f"Ones Tensor: \n {x_ones} \n")
x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data
print(f"Random Tensor: \n {x_rand} \n")
Out:
Ones Tensor:
tensor([[1, 1],
[1, 1]])
Random Tensor:
tensor([[0.9802, 0.0761],
[0.8980, 0.4541]])
With random or constant values:
shape
is a tuple of tensor dimensions. In the functions below, it determines the dimensionality of the output tensor.
shape = (2,3,)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)
print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")
Out:
Random Tensor:
tensor([[0.1383, 0.0385, 0.0745],
[0.1842, 0.2020, 0.1991]])
Ones Tensor:
tensor([[1., 1., 1.],
[1., 1., 1.]])
Zeros Tensor:
tensor([[0., 0., 0.],
[0., 0., 0.]])
Attributes of a Tensor
Tensor attributes describe their shape, datatype, and the device on which they are stored.
tensor = torch.rand(3,4)
print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")
Out:
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu
Operations on Tensors
超過100個tensor 操作,包括算術(shù)挠他,線性代數(shù)扳抽,矩陣操作(轉(zhuǎn)置,索引殖侵,切片)贸呢,采樣和更多的全面描述這里。
這些操作都可以在GPU上運行(速度通常高于CPU)拢军。如果你使用的是Colab楞陷,可以在Runtime >更改運行時類型>GPU。
默認情況下茉唉,張量是在CPU上創(chuàng)建的固蛾。我們需要明確地將張量移動到GPU中。方法(在檢查GPU可用性之后)赌渣。請記住魏铅,跨設(shè)備復(fù)制大型張量在時間和內(nèi)存方面是昂貴的!
# We move our tensor to the GPU if available
if torch.cuda.is_available():
tensor = tensor.to('cuda')
Try out some of the operations from the list. If you’re familiar with the NumPy API, you’ll find the Tensor API a breeze to use.
Standard numpy-like indexing and slicing:
tensor = torch.ones(4, 4)
print('First row: ',tensor[0])
print('First column: ', tensor[:, 0])
print('Last column:', tensor[..., -1])
tensor[:,1] = 0
print(tensor)
Out:
First row: tensor([1., 1., 1., 1.])
First column: tensor([1., 1., 1., 1.])
Last column: tensor([1., 1., 1., 1.])
tensor([[1., 0., 1., 1.],
[1., 0., 1., 1.],
[1., 0., 1., 1.],
[1., 0., 1., 1.]])
Joining tensors You can use torch.cat
to concatenate a sequence of tensors along a given dimension. See also torch.stack, another tensor joining op that is subtly different from torch.cat
.
# 4*4合并
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)
Out:
tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]])
Arithmetic operations
# This computes the matrix multiplication between two tensors. y1, y2, y3 will have the same value
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)
y3 = torch.rand_like(tensor)
torch.matmul(tensor, tensor.T, out=y3)
# This computes the element-wise product. z1, z2, z3 will have the same value
z1 = tensor * tensor
z2 = tensor.mul(tensor)
z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)
Single-element tensors如果你有一個one-element tensor,例如通過將一個tensor 的所有值聚合為一個值坚芜,你可以使用' item() '將它轉(zhuǎn)換為一個Python數(shù)值
agg = tensor.sum()
agg_item = agg.item()
print(agg_item, type(agg_item))
Out:
12.0 <class 'float'>
In-place operations Operations that store the result into the operand are called in-place. They are denoted by a _
suffix. For example: x.copy_(y)
, x.t_()
, will change x
.
print(tensor, "\n")
tensor.add_(5)
print(tensor)
Out:
tensor([[1., 0., 1., 1.],
[1., 0., 1., 1.],
[1., 0., 1., 1.],
[1., 0., 1., 1.]])
tensor([[6., 5., 6., 6.],
[6., 5., 6., 6.],
[6., 5., 6., 6.],
[6., 5., 6., 6.]])
NOTE
In-place operations save some memory, but can be problematic when computing derivatives because of an immediate loss of history. Hence, their use is discouraged.
Bridge with NumPy
Tensors on the CPU and NumPy arrays can share their underlying memory locations, and changing one will change the other.
Tensor to NumPy array
t = torch.ones(5)
print(f"t: {t}")
n = t.numpy()
print(f"n: {n}")
Out:
t: tensor([1., 1., 1., 1., 1.])
n: [1. 1. 1. 1. 1.]
A change in the tensor reflects in the NumPy array.
t.add_(1)
print(f"t: {t}")
print(f"n: {n}")
Out:
t: tensor([2., 2., 2., 2., 2.])
n: [2. 2. 2. 2. 2.]
NumPy array to Tensor
n = np.ones(5)
t = torch.from_numpy(n)
Changes in the NumPy array reflects in the tensor.
np.add(n, 1, out=n)
print(f"t: {t}")
print(f"n: {n}")
Out:
t: tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
n: [2. 2. 2. 2. 2.]
Total running time of the script: ( 0 minutes 6.199 seconds)
2. DATASETS & DATALOADERS
處理數(shù)據(jù)樣本的代碼可能會變得混亂且難以維護;理想情況下览芳,我們希望數(shù)據(jù)集代碼與模型訓(xùn)練代碼解耦,以獲得更好的可讀性和模塊化鸿竖。PyTorch提供了兩個數(shù)據(jù)原語:torch.utils.data.DataLoader
and torch.utils.data.Dataset
沧竟。允許您使用預(yù)加載的數(shù)據(jù)集以及您自己的數(shù)據(jù)。 Dataset
存儲示例及其相應(yīng)的標簽缚忧,DataLoader
在Dataset
周圍包裝一個可迭代對象以方便訪問示例悟泵。
PyTorch域庫提供了許多預(yù)加載的數(shù)據(jù)集(比如FashionMNIST),這些數(shù)據(jù)集的子類的torch.utils.data.Dataset
闪水,并實現(xiàn)特定于特定數(shù)據(jù)的函數(shù)糕非。它們可以用于模型原型和基準測試。你可以在這里找到它們: Image Datasets, Text Datasets, and Audio Datasets
Loading a Dataset
Here is an example of how to load the Fashion-MNIST dataset from TorchVision. Fashion-MNIST is a dataset of Zalando’s article images consisting of 60,000 training examples and 10,000 test examples. Each example comprises a 28×28 grayscale image and an associated label from one of 10 classes.
We load the FashionMNIST Dataset with the following parameters:
-
root
is the path where the train/test data is stored, -
train
specifies training or test dataset, -
download=True
downloads the data from the internet if it’s not available atroot
. -
transform
andtarget_transform
specify the feature and label transformations
import torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt
training_data = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=ToTensor()
)
test_data = datasets.FashionMNIST(
root="data",
train=False,
download=True,
transform=ToTensor()
)
Iterating and Visualizing the Dataset
We can index Datasets
manually like a list: training_data[index]
. We use matplotlib
to visualize some samples in our training data.
labels_map = {
0: "T-Shirt",
1: "Trouser",
2: "Pullover",
3: "Dress",
4: "Coat",
5: "Sandal",
6: "Shirt",
7: "Sneaker",
8: "Bag",
9: "Ankle Boot",
}
figure = plt.figure(figsize=(8, 8))
cols, rows = 3, 3
for i in range(1, cols * rows + 1):
sample_idx = torch.randint(len(training_data), size=(1,)).item()
img, label = training_data[sample_idx]
figure.add_subplot(rows, cols, i)
plt.title(labels_map[label])
plt.axis("off")
plt.imshow(img.squeeze(), cmap="gray")
plt.show()
Creating a Custom Dataset for your files
A custom Dataset class must implement three functions: init, len, and getitem. Take a look at this implementation; the FashionMNIST images are stored in a directory img_dir
, and their labels are stored separately in a CSV file annotations_file
.
In the next sections, we’ll break down what’s happening in each of these functions.
import os
import pandas as pd
from torchvision.io import read_image
class CustomImageDataset(Dataset):
def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
self.img_labels = pd.read_csv(annotations_file)
self.img_dir = img_dir
self.transform = transform
self.target_transform = target_transform
def __len__(self):
return len(self.img_labels)
def __getitem__(self, idx):
img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
image = read_image(img_path)
label = self.img_labels.iloc[idx, 1]
if self.transform:
image = self.transform(image)
if self.target_transform:
label = self.target_transform(label)
return image, label
_init_
The init function is run once when instantiating the Dataset object. We initialize the directory containing the images, the annotations file, and both transforms (covered in more detail in the next section).
The labels.csv file looks like:
tshirt1.jpg, 0
tshirt2.jpg, 0
......
ankleboot999.jpg, 9
def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
self.img_labels = pd.read_csv(annotations_file)
self.img_dir = img_dir
self.transform = transform
self.target_transform = target_transform
_len_
The len function returns the number of samples in our dataset.
Example:
def __len__(self):
return len(self.img_labels)
_getitem_
The getitem function loads and returns a sample from the dataset at the given index idx
. Based on the index, it identifies the image’s location on disk, converts that to a tensor using read_image
, retrieves the corresponding label from the csv data in self.img_labels
, calls the transform functions on them (if applicable), and returns the tensor image and corresponding label in a tuple.
def __getitem__(self, idx):
img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
image = read_image(img_path)
label = self.img_labels.iloc[idx, 1]
if self.transform:
image = self.transform(image)
if self.target_transform:
label = self.target_transform(label)
return image, label
Preparing your data for training with DataLoaders
The Dataset
retrieves our dataset’s features and labels one sample at a time. While training a model, we typically want to pass samples in “minibatches”, reshuffle the data at every epoch to reduce model overfitting, and use Python’s multiprocessing
to speed up data retrieval.
DataLoader
is an iterable that abstracts this complexity for us in an easy API.
from torch.utils.data import DataLoader
train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)
Iterate through the DataLoader
We have loaded that dataset into the Dataloader
and can iterate through the dataset as needed. Each iteration below returns a batch of train_features
and train_labels
(containing batch_size=64
features and labels respectively). Because we specified shuffle=True
, after we iterate over all batches the data is shuffled (for finer-grained control over the data loading order, take a look at Samplers).
# Display image and label.
train_features, train_labels = next(iter(train_dataloader))
print(f"Feature batch shape: {train_features.size()}")
print(f"Labels batch shape: {train_labels.size()}")
img = train_features[0].squeeze()
label = train_labels[0]
plt.imshow(img, cmap="gray")
plt.show()
print(f"Label: {label}")
Out:
Feature batch shape: torch.Size([64, 1, 28, 28])
Labels batch shape: torch.Size([64])
Label: 0
3. TRANSFORMS
數(shù)據(jù)并不總是以訓(xùn)練機器學(xué)習(xí)算法所需的最終處理形式出現(xiàn)。我們使用transforms來執(zhí)行一些數(shù)據(jù)操作朽肥,使其適合于訓(xùn)練禁筏。
所有TorchVision數(shù)據(jù)集都有兩個參數(shù)—用于修改特性的transform和用于修改標簽的target_transform—它們接受包含轉(zhuǎn)換邏輯的可調(diào)用對象。torchvision.transforms模塊提供了幾種常用的開箱即用的轉(zhuǎn)換衡招。
The FashionMNIST features are in PIL Image format, and the labels are integers. For training, we need the features as normalized tensors, and the labels as one-hot encoded tensors. To make these transformations, we use ToTensor
and Lambda
.
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda
ds = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=ToTensor(),
target_transform=Lambda(lambda y: torch.zeros(10, dtype=torch.float).scatter_(0, torch.tensor(y), value=1))
)
ToTensor()
ToTensor converts a PIL image or NumPy ndarray
into a FloatTensor
. and scales the image’s pixel intensity values in the range [0., 1.]
Lambda Transforms
Lambda transforms apply any user-defined lambda function. Here, we define a function to turn the integer into a one-hot encoded tensor. It first creates a zero tensor of size 10 (the number of labels in our dataset) and calls scatter_ which assigns a value=1
on the index as given by the label y
.
target_transform = Lambda(lambda y: torch.zeros(
10, dtype=torch.float).scatter_(dim=0, index=torch.tensor(y), value=1))
4. BUILD THE NEURAL NETWORK
Neural networks comprise of layers/modules that perform operations on data. The torch.nn namespace provides all the building blocks you need to build your own neural network. Every module in PyTorch subclasses the nn.Module. A neural network is a module itself that consists of other modules (layers). This nested structure allows for building and managing complex architectures easily.
In the following sections, we’ll build a neural network to classify images in the FashionMNIST dataset.
神經(jīng)網(wǎng)絡(luò)由對數(shù)據(jù)執(zhí)行操作的層/模塊組成篱昔。該torch.nn命名空間提供了所有你需要建立自己的神經(jīng)網(wǎng)絡(luò)的基石。在PyTorch每個模塊的子類nn.Module始腾。神經(jīng)網(wǎng)絡(luò)是一個模塊本身州刽,由其他模塊(層)組成。這種嵌套結(jié)構(gòu)允許輕松構(gòu)建和管理復(fù)雜的架構(gòu)浪箭。
在以下部分中穗椅,我們將構(gòu)建一個神經(jīng)網(wǎng)絡(luò)來對 FashionMNIST 數(shù)據(jù)集中的圖像進行分類。
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
Get Device for Training
We want to be able to train our model on a hardware accelerator like the GPU, if it is available. Let’s check to see if torch.cuda is available, else we continue to use the CPU.
我們希望能夠在 GPU 等硬件加速器上訓(xùn)練我們的模型(如果可用)山林。讓我們檢查一下 torch.cuda是否可用房待,否則我們繼續(xù)使用CPU邢羔。
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Using {} device'.format(device))
Out:
Using cuda device
Define the Class
We define our neural network by subclassing nn.Module
, and initialize the neural network layers in __init__
. Every nn.Module
subclass implements the operations on input data in the forward
method.
我們通過子類化定義我們的神經(jīng)網(wǎng)絡(luò)nn.Module
驼抹,并在 中初始化神經(jīng)網(wǎng)絡(luò)層__init__
。每個nn.Module
子類都在forward
方法中實現(xiàn)對輸入數(shù)據(jù)的操作拜鹤。
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10),
nn.ReLU()
)
def forward(self, x):
# 一維展開
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
We create an instance of NeuralNetwork
, and move it to the device
, and print its structure.
model = NeuralNetwork().to(device)
print(model)
Out:
NeuralNetwork(
(flatten): Flatten(start_dim=1, end_dim=-1)
(linear_relu_stack): Sequential(
(0): Linear(in_features=784, out_features=512, bias=True)
(1): ReLU()
(2): Linear(in_features=512, out_features=512, bias=True)
(3): ReLU()
(4): Linear(in_features=512, out_features=10, bias=True)
(5): ReLU()
)
)
To use the model, we pass it the input data. This executes the model’s forward
, along with some background operations. Do not call model.forward()
directly!
Calling the model on the input returns a 10-dimensional tensor with raw predicted values for each class. We get the prediction probabilities by passing it through an instance of the nn.Softmax
module.
為了使用模型框冀,我們將輸入數(shù)據(jù)傳遞給它。這將執(zhí)行模型的forward
以及一些后臺操作敏簿。不要直接調(diào)用model.forward()
在輸入上調(diào)用模型會返回一個 10 維張量明也,其中包含每個類的原始預(yù)測值。我們通過將其傳遞給nn.Softmax
模塊的實例來獲得預(yù)測概率惯裕。
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")
Out:
Predicted class: tensor([2], device='cuda:0')
Model Layers
Let’s break down the layers in the FashionMNIST model. To illustrate it, we will take a sample minibatch of 3 images of size 28x28 and see what happens to it as we pass it through the network.
input_image = torch.rand(3,28,28)
print(input_image.size())
Out:
torch.Size([3, 28, 28])
nn.Flatten
We initialize the nn.Flatten layer to convert each 2D 28x28 image into a contiguous array of 784 pixel values ( the minibatch dimension (at dim=0) is maintained).
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())
Out:
torch.Size([3, 784])
nn.Linear
The linear layer is a module that applies a linear transformation on the input using its stored weights and biases.
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())
Out:
torch.Size([3, 20])
nn.ReLU
Non-linear activations are what create the complex mappings between the model’s inputs and outputs. They are applied after linear transformations to introduce nonlinearity, helping neural networks learn a wide variety of phenomena.
In this model, we use nn.ReLU between our linear layers, but there’s other activations to introduce non-linearity in your model.
print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")
nn.Sequential
nn.Sequential is an ordered container of modules. The data is passed through all the modules in the same order as defined. You can use sequential containers to put together a quick network like seq_modules
.
seq_modules = nn.Sequential(
flatten,
layer1,
nn.ReLU(),
nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)
nn.Softmax
The last linear layer of the neural network returns logits - raw values in [-infty, infty] - which are passed to the nn.Softmax module. The logits are scaled to values [0, 1] representing the model’s predicted probabilities for each class. dim
parameter indicates the dimension along which the values must sum to 1.
softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)
Model Parameters
Many layers inside a neural network are parameterized, i.e. have associated weights and biases that are optimized during training. Subclassing nn.Module
automatically tracks all fields defined inside your model object, and makes all parameters accessible using your model’s parameters()
or named_parameters()
methods.
In this example, we iterate over each parameter, and print its size and a preview of its values.
print("Model structure: ", model, "\n\n")
for name, param in model.named_parameters():
print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")
Out:
Model structure: NeuralNetwork(
(flatten): Flatten(start_dim=1, end_dim=-1)
(linear_relu_stack): Sequential(
(0): Linear(in_features=784, out_features=512, bias=True)
(1): ReLU()
(2): Linear(in_features=512, out_features=512, bias=True)
(3): ReLU()
(4): Linear(in_features=512, out_features=10, bias=True)
(5): ReLU()
)
)
Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[-0.0153, -0.0177, 0.0286, ..., -0.0073, -0.0272, 0.0314],
[ 0.0067, -0.0347, 0.0343, ..., 0.0347, -0.0196, -0.0094]],
device='cuda:0', grad_fn=<SliceBackward>)
Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([ 0.0118, -0.0279], device='cuda:0', grad_fn=<SliceBackward>)
Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[-0.0319, -0.0182, -0.0130, ..., -0.0155, -0.0372, -0.0199],
[-0.0051, 0.0356, 0.0397, ..., 0.0419, -0.0151, 0.0283]],
device='cuda:0', grad_fn=<SliceBackward>)
Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) | Values : tensor([-0.0041, -0.0237], device='cuda:0', grad_fn=<SliceBackward>)
Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) | Values : tensor([[-0.0028, 0.0333, 0.0018, ..., -0.0088, -0.0022, -0.0389],
[-0.0091, 0.0066, -0.0125, ..., -0.0255, 0.0282, 0.0056]],
device='cuda:0', grad_fn=<SliceBackward>)
Layer: linear_relu_stack.4.bias | Size: torch.Size([10]) | Values : tensor([0.0201, 0.0236], device='cuda:0', grad_fn=<SliceBackward>)
5. AUTOMATIC DIFFERENTIATION
When training neural networks, the most frequently used algorithm is back propagation. In this algorithm, parameters (model weights) are adjusted according to the gradient of the loss function with respect to the given parameter.
To compute those gradients, PyTorch has a built-in differentiation engine called torch.autograd
. It supports automatic computation of gradient for any computational graph.
Consider the simplest one-layer neural network, with input x
, parameters w
and b
, and some loss function. It can be defined in PyTorch in the following manner:
在訓(xùn)練神經(jīng)網(wǎng)絡(luò)時温数,最常用的算法是 反向傳播。在該算法中蜻势,參數(shù)(模型權(quán)重)根據(jù)損失函數(shù)相對于給定參數(shù)的梯度進行調(diào)整撑刺。為了計算這些梯度,PyTorch 有一個名為 的內(nèi)置微分引擎torch.autograd
握玛。它支持任何計算圖的梯度自動計算够傍。
考慮最簡單的一層神經(jīng)網(wǎng)絡(luò),具有輸入x
挠铲、參數(shù)w
和b
冕屯,以及一些損失函數(shù)。它可以通過以下方式在 PyTorch 中定義:
import torch
x = torch.ones(5) # input tensor
y = torch.zeros(3) # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
A function that we apply to tensors to construct computational graph is in fact an object of class Function
. This object knows how to compute the function in the forward direction, and also how to compute its derivative during the backward propagation step. A reference to the backward propagation function is stored in grad_fn
property of a tensor. You can find more information of Function
in the documentation.
print('Gradient function for z =',z.grad_fn)
print('Gradient function for loss =', loss.grad_fn)
Out:
Gradient function for z = <AddBackward0 object at 0x7f34c97b5588>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward object at 0x7f34c97b5588>
Computing Gradients
To optimize weights of parameters in the neural network, we need to compute the derivatives of our loss function with respect to parameters, namely, we need and
under some fixed values of
x
and y
. To compute those derivatives, we call loss.backward()
, and then retrieve the values from w.grad
and b.grad
:
loss.backward()
print(w.grad)
print(b.grad)
Out:
tensor([[0.0540, 0.3248, 0.3158],
[0.0540, 0.3248, 0.3158],
[0.0540, 0.3248, 0.3158],
[0.0540, 0.3248, 0.3158],
[0.0540, 0.3248, 0.3158]])
tensor([0.0540, 0.3248, 0.3158])
Disabling Gradient Tracking
By default, all tensors with requires_grad=True
are tracking their computational history and support gradient computation. However, there are some cases when we do not need to do that, for example, when we have trained the model and just want to apply it to some input data, i.e. we only want to do forward computations through the network. We can stop tracking computations by surrounding our computation code with torch.no_grad()
block:
默認情況下拂苹,所有具有 的張量requires_grad=True
都在跟蹤它們的計算歷史并支持梯度計算安聘。但是,有些情況下我們不需要這樣做,例如浴韭,當(dāng)我們訓(xùn)練了模型并且只想將其應(yīng)用于某些輸入數(shù)據(jù)時带迟,即我們只想通過網(wǎng)絡(luò)進行前向計算。我們可以通過用torch.no_grad()
塊包圍我們的計算代碼來停止跟蹤計算 :
z = torch.matmul(x, w)+b
print(z.requires_grad)
with torch.no_grad():
z = torch.matmul(x, w)+b
print(z.requires_grad)
Out:
True
False
Another way to achieve the same result is to use the detach()
method on the tensor:
實現(xiàn)相同結(jié)果的另一種detach()
方法是在張量上使用該方法:
z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)
Out:
False
-
There are reasons you might want to disable gradient tracking:
To mark some parameters in your neural network at frozen parameters. This is a very common scenario for finetuning a pretrained networkTo speed up computations when you are only doing forward pass, because computations on tensors that do not track gradients would be more efficient.
您可能想要禁用梯度跟蹤的原因有:
- 將神經(jīng)網(wǎng)絡(luò)中的某些參數(shù)標記為凍結(jié)參數(shù)囱桨。這是微調(diào)預(yù)訓(xùn)練網(wǎng)絡(luò)的一個非常常見的場景
- 在僅進行前向傳遞時加快計算速度仓犬,因為對不跟蹤梯度的張量進行計算會更有效。
6. OPTIMIZING MODEL PARAMETERS
Now that we have a model and data it’s time to train, validate and test our model by optimizing its parameters on our data. Training a model is an iterative process; in each iteration (called an epoch) the model makes a guess about the output, calculates the error in its guess (loss), collects the derivatives of the error with respect to its parameters (as we saw in the previous section), and optimizes these parameters using gradient descent. For a more detailed walkthrough of this process, check out this video on backpropagation from 3Blue1Brown.
Prerequisite Code
We load the code from the previous sections on Datasets & DataLoaders and Build Model.
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda
training_data = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=ToTensor()
)
test_data = datasets.FashionMNIST(
root="data",
train=False,
download=True,
transform=ToTensor()
)
train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.linear_relu_stack = nn.Sequential(
nn.Flatten(),
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10),
nn.ReLU()
)
def forward(self, x):
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork()
Hyperparameters
Hyperparameters are adjustable parameters that let you control the model optimization process. Different hyperparameter values can impact model training and convergence rates (read more about hyperparameter tuning)
We define the following hyperparameters for training:
Number of Epochs - the number times to iterate over the dataset
Batch Size - the number of data samples propagated through the network before the parameters are updated
Learning Rate - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training.
超參數(shù)是可調(diào)節(jié)的參數(shù)舍肠,可讓您控制模型優(yōu)化過程搀继。不同的超參數(shù)值會影響模型訓(xùn)練和收斂速度(閱讀有關(guān)超參數(shù)調(diào)整的更多信息)
我們?yōu)橛?xùn)練定義了以下超參數(shù):
Number of Epochs - 迭代數(shù)據(jù)集的次數(shù)
Batch Size - 在更新參數(shù)之前通過網(wǎng)絡(luò)傳播的數(shù)據(jù)樣本數(shù)量
學(xué)習(xí)率- 在每個批次/時期更新模型參數(shù)的程度。較小的值會導(dǎo)致學(xué)習(xí)速度變慢翠语,而較大的值可能會導(dǎo)致訓(xùn)練過程中出現(xiàn)不可預(yù)測的行為叽躯。
learning_rate = 1e-3
batch_size = 64
epochs = 5
Optimization Loop
Once we set our hyperparameters, we can then train and optimize our model with an optimization loop. Each iteration of the optimization loop is called an epoch.
Each epoch consists of two main parts:
The Train Loop - iterate over the training dataset and try to converge to optimal parameters.
The Validation/Test Loop - iterate over the test dataset to check if model performance is improving.
Let’s briefly familiarize ourselves with some of the concepts used in the training loop. Jump ahead to see the Full Implementation of the optimization loop.一旦我們設(shè)置了我們的超參數(shù),我們就可以使用優(yōu)化循環(huán)來訓(xùn)練和優(yōu)化我們的模型肌括。優(yōu)化循環(huán)的每次迭代稱為一個epoch躏精。
每個時代由兩個主要部分組成:
訓(xùn)練循環(huán)- 迭代訓(xùn)練數(shù)據(jù)集并嘗試收斂到最佳參數(shù)。
驗證/測試循環(huán)- 迭代測試數(shù)據(jù)集以檢查模型性能是否正在提高诅炉。
讓我們簡要地熟悉一下訓(xùn)練循環(huán)中使用的一些概念罢荡。跳轉(zhuǎn)到查看優(yōu)化循環(huán)的完整實現(xiàn)。
Loss Function
When presented with some training data, our untrained network is likely not to give the correct answer. Loss function measures the degree of dissimilarity of obtained result to the target value, and it is the loss function that we want to minimize during training. To calculate the loss we make a prediction using the inputs of our given data sample and compare it against the true data label value.
Common loss functions include nn.MSELoss (Mean Square Error) for regression tasks, and nn.NLLLoss (Negative Log Likelihood) for classification. nn.CrossEntropyLoss combines nn.LogSoftmax
and nn.NLLLoss
.
We pass our model’s output logits to nn.CrossEntropyLoss
, which will normalize the logits and compute the prediction error.
當(dāng)提供一些訓(xùn)練數(shù)據(jù)時紧索,我們未經(jīng)訓(xùn)練的網(wǎng)絡(luò)可能不會給出正確的答案袁辈。損失函數(shù)衡量得到的結(jié)果與目標值的不相似程度,是我們在訓(xùn)練過程中想要最小化的損失函數(shù)珠漂。為了計算損失晚缩,我們使用給定數(shù)據(jù)樣本的輸入進行預(yù)測,并將其與真實數(shù)據(jù)標簽值進行比較媳危。
常見的損失函數(shù)包括用于回歸任務(wù)的nn.MSELoss(均方誤差)和 用于分類的nn.NLLLoss(負對數(shù)似然)荞彼。 nn.CrossEntropyLoss結(jié)合nn.LogSoftmax
和nn.NLLLoss
。
我們將模型的輸出 logits 傳遞給nn.CrossEntropyLoss
待笑,這將標準化 logits 并計算預(yù)測誤差鸣皂。
# Initialize the loss function
loss_fn = nn.CrossEntropyLoss()
Optimizer
Optimization is the process of adjusting model parameters to reduce model error in each training step. Optimization algorithms define how this process is performed (in this example we use Stochastic Gradient Descent). All optimization logic is encapsulated in the optimizer
object. Here, we use the SGD optimizer; additionally, there are many different optimizers available in PyTorch such as ADAM and RMSProp, that work better for different kinds of models and data.
We initialize the optimizer by registering the model’s parameters that need to be trained, and passing in the learning rate hyperparameter.
優(yōu)化是在每個訓(xùn)練步驟中調(diào)整模型參數(shù)以減少模型誤差的過程。優(yōu)化算法定義了這個過程是如何執(zhí)行的(在這個例子中我們使用隨機梯度下降)滋觉。所有優(yōu)化邏輯都封裝在optimizer
對象中签夭。在這里,我們使用 SGD 優(yōu)化器椎侠;此外第租, PyTorch 中有許多不同的優(yōu)化器可用,例如 ADAM 和 RMSProp我纪,它們更適用于不同類型的模型和數(shù)據(jù)慎宾。
我們通過注冊模型需要訓(xùn)練的參數(shù)并傳入學(xué)習(xí)率超參數(shù)來初始化優(yōu)化器丐吓。
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
Inside the training loop, optimization happens in three steps:
- Call
optimizer.zero_grad()
to reset the gradients of model parameters. Gradients by default add up; to prevent double-counting, we explicitly zero them at each iteration. - Backpropagate the prediction loss with a call to
loss.backwards()
. PyTorch deposits the gradients of the loss w.r.t. each parameter. - Once we have our gradients, we call
optimizer.step()
to adjust the parameters by the gradients collected in the backward pass.
在訓(xùn)練循環(huán)中,優(yōu)化分三步進行:
- 調(diào)用
optimizer.zero_grad()
以重置模型參數(shù)的梯度趟据。默認情況下漸變相加券犁;為了防止重復(fù)計算,我們在每次迭代時明確地將它們歸零汹碱。 - 調(diào)用 來反向傳播預(yù)測損失
loss.backwards()
粘衬。PyTorch 存儲了每個參數(shù)的損失梯度。 - 一旦我們有了梯度咳促,我們就調(diào)用
optimizer.step()
在向后傳遞中收集的梯度來調(diào)整參數(shù)稚新。
Full Implementation
We define train_loop
that loops over our optimization code, and test_loop
that evaluates the model’s performance against our test data.
def train_loop(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
for batch, (X, y) in enumerate(dataloader):
# Compute prediction and loss
pred = model(X)
loss = loss_fn(pred, y)
# Backpropagation
# 調(diào)用optimizer.zero_grad()以重置模型參數(shù)的梯度
optimizer.zero_grad()
# 調(diào)用 =loss.backwards()來反向傳播預(yù)測損失
loss.backward()
# 一旦我們有了梯度,我們就調(diào)用optimizer.step()在向后傳遞中收集的梯度來調(diào)整參數(shù)
optimizer.step()
if batch % 100 == 0:
loss, current = loss.item(), batch * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
def test_loop(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
We initialize the loss function and optimizer, and pass it to train_loop
and test_loop
. Feel free to increase the number of epochs to track the model’s improving performance.
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
epochs = 10
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train_loop(train_dataloader, model, loss_fn, optimizer)
test_loop(test_dataloader, model, loss_fn)
print("Done!")
7. SAVE AND LOAD THE MODEL
In this section we will look at how to persist model state with saving, loading and running model predictions.
import torch
import torch.onnx as onnx
import torchvision.models as models
Saving and Loading Model Weights
PyTorch models store the learned parameters in an internal state dictionary, called state_dict
. These can be persisted via the torch.save
method:
PyTorch 模型將學(xué)習(xí)到的參數(shù)存儲在一個名為 的內(nèi)部狀態(tài)字典中state_dict
跪腹。這些可以通過以下torch.save
方法持久化:
model = models.vgg16(pretrained=True)
torch.save(model.state_dict(), 'model_weights.pth')
To load model weights, you need to create an instance of the same model first, and then load the parameters using load_state_dict()
method.
要加載模型權(quán)重褂删,您需要先創(chuàng)建相同模型的實例,然后使用load_state_dict()
方法加載參數(shù)冲茸。
model = models.vgg16() # we do not specify pretrained=True, i.e. do not load default weights
model.load_state_dict(torch.load('model_weights.pth'))
model.eval()
be sure to call
model.eval()
method before inferencing to set the dropout and batch normalization layers to evaluation mode. Failing to do this will yield inconsistent inference results.一定要調(diào)用
model.eval()
在推理之前屯阀,以將 dropout 和批量歸一化層設(shè)置為評估模式。不這樣做會產(chǎn)生不一致的推理結(jié)果轴术。
Saving and Loading Models with Shapes
When loading model weights, we needed to instantiate the model class first, because the class defines the structure of a network. We might want to save the structure of this class together with the model, in which case we can pass model
(and not model.state_dict()
) to the saving function:
在加載模型權(quán)重時难衰,我們需要先實例化模型類,因為該類定義了網(wǎng)絡(luò)的結(jié)構(gòu)膳音。我們可能希望將此類的結(jié)構(gòu)與模型一起保存召衔,在這種情況下,我們可以將model
(而不是model.state_dict()
)傳遞給保存函數(shù):
torch.save(model, 'model.pth')
We can then load the model like this:
model = torch.load('model.pth')
This approach uses Python pickle module when serializing the model, thus it relies on the actual class definition to be available when loading the model.
這種方法在序列化模型時使用 Python pickle模塊祭陷,因此它依賴于加載模型時可用的實際類定義。
Exporting Model to ONNX
PyTorch also has native ONNX export support. Given the dynamic nature of the PyTorch execution graph, however, the export process must traverse the execution graph to produce a persisted ONNX model. For this reason, a test variable of the appropriate size should be passed in to the export routine (in our case, we will create a dummy zero tensor of the correct size):
PyTorch 還具有本機 ONNX 導(dǎo)出支持趣席。然而兵志,鑒于 PyTorch 執(zhí)行圖的動態(tài)特性,導(dǎo)出過程必須遍歷執(zhí)行圖以生成持久化的 ONNX 模型宣肚。出于這個原因想罕,應(yīng)該將適當(dāng)大小的測試變量傳遞給導(dǎo)出例程(在我們的例子中,我們將創(chuàng)建一個正確大小的虛擬零張量):
input_image = torch.zeros((1,3,224,224))
onnx.export(model, input_image, 'model.onnx')
There are a lot of things you can do with ONNX model, including running inference on different platforms and in different programming languages. For more details, we recommend visiting ONNX tutorial.
Congratulations! You have completed the PyTorch beginner tutorial! Try revisting the first page to see the tutorial in its entirety again. We hope this tutorial has helped you get started with deep learning on PyTorch. Good luck!
您可以使用 ONNX 模型做很多事情霉涨,包括在不同平臺和不同編程語言上運行推理按价。有關(guān)更多詳細信息,我們建議訪問ONNX 教程笙瑟。
恭喜楼镐!您已完成 PyTorch 初學(xué)者教程!嘗試 重新查看第一頁以再次查看整個教程往枷。我們希望本教程能幫助您開始在 PyTorch 上進行深度學(xué)習(xí)框产。祝你好運凄杯!