深度學(xué)習(xí)框架PyTorch入門(2)

本文是DEEP LEARNING WITH PYTORCH: A 60 MINUTE BLITZ的學(xué)習(xí)筆記

這里繼續(xù)進(jìn)行第二部分A GENTLE INTRODUCTION TO TORCH.AUTOGRAD:

torch.autograd is PyTorch’s automatic differentiation engine that powers neural network training. In this section, you will get a conceptual understanding of how autograd helps a neural network train. (自動(dòng)求導(dǎo) --- 神經(jīng)網(wǎng)絡(luò)訓(xùn)練)

1. Background

Neural networks (NNs) are a collection of nested functions that are executed on some input data. These functions are defined by parameters (consisting of weights and biases), which in PyTorch are stored in tensors.

Training a NN happens in two steps:

  • Forward Propagation: In forward prop, the NN makes its best guess about the correct output. It runs the input data through each of its functions to make this guess.
  • Backward Propagation: In backprop, the NN adjusts its parameters proportionate to the error in its guess. It does this by traversing backwards from the output, collecting the derivatives of the error with respect to the parameters of the functions (gradients), and optimizing the parameters using gradient descent.

訓(xùn)練神經(jīng)網(wǎng)絡(luò)的兩步:前向傳播&反向傳播肚逸,反向傳播即是誤差error(loss)對(duì)weights和biases求偏導(dǎo)陈醒,不斷更新weights和biases,反復(fù)迭代的過程。

2. Usage in PyTorch

Let’s take a look at a single training step. For this example, we load a pretrained resnet18 model from torchvision. We create a random data tensor to represent a single image with 3 channels, and height & width of 64, and its corresponding label initialized to some random values. Label in pretrained models has shape (1,1000).
加載torchvision中的預(yù)訓(xùn)練模型resnet18房轿。隨機(jī)生成的數(shù)據(jù)是一個(gè)3個(gè)通道64×64的圖像,label定義為1000個(gè)數(shù)值。

import torch
from torchvision.models import resnet18, ResNet18_Weights
model = resnet18(weights=ResNet18_Weights.DEFAULT)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)

Next, we run the input data through the model through each of its layers to make a prediction. This is the forward pass. We use the model’s prediction and the corresponding label to calculate the error (loss). The next step is to backpropagate this error through the network. Backward propagation is kicked off when we call .backward() on the error tensor. Autograd then calculates and stores the gradients for each model parameter in the parameter’s .grad attribute.

prediction = model(data) # forward pass
loss = (prediction - labels).sum()
loss.backward() # backward pass

Next, we load an optimizer, in this case SGD with a learning rate of 0.01 and momentum. Finally, we call .step() to initiate gradient descent. The optimizer adjusts each parameter by its gradient stored in .grad.

optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)
optim.step() #gradient descent
3. Differentiation in Autograd

We create two tensors a and b with requires_grad=True. This signals to autograd that every operation on them should be tracked. We create another tensor Q from a and b.

Q=3a^3-b^2

\frac{\partial Q}{\partial a}=9a^2

\frac{\partial Q}{\partial b}=-2b

import torch

a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)

Q = 3*a**3 - b**2

對(duì)于方法.backward()而言,如果對(duì)應(yīng)變量是一個(gè)標(biāo)量片仿,則無需傳遞參數(shù),如果對(duì)應(yīng)變量是一個(gè)向量尤辱,則必須顯式指定一個(gè)gradient參數(shù)砂豌,gradient參數(shù)是一個(gè)shape與對(duì)應(yīng)變量相同的Tensor。
\frac{\mathrmdvzdrm7 Q }{\mathrm8bftxv7 Q}=1

external_grad = torch.tensor([1., 1.])
Q.backward(gradient=external_grad)

# check if collected gradients are correct
print(9*a**2 == a.grad)
print(-2*b == b.grad)
(留)4. Optional Reading - Vector Calculus using autograd
5. Computational Graph

Conceptually, autograd keeps a record of data (tensors) & all executed operations (along with the resulting new tensors) in a directed acyclic graph (DAG) consisting of Function objects. In this DAG, leaves are the input tensors, roots are the output tensors. By tracing this graph from roots to leaves, you can automatically compute the gradients using the chain rule.

Below is a visual representation of the DAG in our example. In the graph, the arrows are in the direction of the forward pass. The nodes represent the backward functions of each operation in the forward pass. The leaf nodes in blue represent our leaf tensors a and b.

DAGs are dynamic in PyTorch啥刻,相當(dāng)于在訓(xùn)練模型時(shí)奸鸯,每迭代一次都會(huì)重新構(gòu)建一個(gè)新的計(jì)算圖咪笑。

6. Exclusion from the DAG

torch.autograd tracks operations on all tensors which have their requires_grad flag set to True. For tensors that don’t require gradients, setting this attribute to False excludes it from the gradient computation DAG.
The output tensor of an operation will require gradients even if only a single input tensor has requires_grad=True.

x = torch.rand(5, 5)
y = torch.rand(5, 5)
z = torch.rand((5, 5), requires_grad=True)

a = x + y
print(f"Does `a` require gradients? : {a.requires_grad}")
b = x + z
print(f"Does `b` require gradients?: {b.requires_grad}")

In a NN, parameters that don’t compute gradients are usually called frozen parameters. It is useful to “freeze” part of your model if you know in advance that you won’t need the gradients of those parameters (this offers some performance benefits by reducing autograd computations).

Another common usecase where exclusion from the DAG is important is for finetuning a pretrained network. In finetuning, we freeze most of the model and typically only modify the classifier layers to make predictions on new labels. (模型的微調(diào))

from torch import nn, optim

model = resnet18(weights=ResNet18_Weights.DEFAULT)

# Freeze all the parameters in the network
for param in model.parameters():
    param.requires_grad = False

Let’s say we want to finetune the model on a new dataset with 10 labels. In resnet, the classifier is the last linear layer model.fc. We can simply replace it with a new linear layer (unfrozen by default) that acts as our classifier.

Now all parameters in the model, except the parameters of model.fc, are frozen. The only parameters that compute gradients are the weights and bias of model.fc.

預(yù)訓(xùn)練模型最終是1000個(gè)labels可帽,此處替換成10個(gè)labels,即改最后的一個(gè)input為512窗怒,output為1000的線性層映跟,變?yōu)?512, 10)蓄拣,除此層外,其他參數(shù)處于“凍結(jié)”狀態(tài)努隙,不再計(jì)算參數(shù)梯度球恤,即微調(diào)(finetune)

model.fc = nn.Linear(512, 10)

# Optimize only the classifier
optimizer = optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)

Notice although we register all the parameters in the optimizer, the only parameters that are computing gradients (and hence updated in gradient descent) are the weights and bias of the classifier.

The same exclusionary functionality is available as a context manager in torch.no_grad().


(留)Further readings:
In-place operations & Multithreaded Autograd
Example implementation of reverse-mode autodiff

參考:

  1. https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html
  2. https://www.youtube.com/watch?v=tIeHLnjs5U8
  3. https://blog.csdn.net/PolarisRisingWar/article/details/116069338
  4. https://towardsdatascience.com/machine-learning-for-beginners-an-introduction-to-neural-networks-d49f22d238f9
  5. https://juejin.cn/post/6844903934876729351
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末荸镊,一起剝皮案震驚了整個(gè)濱河市咽斧,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌躬存,老刑警劉巖张惹,帶你破解...
    沈念sama閱讀 206,126評(píng)論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場(chǎng)離奇詭異岭洲,居然都是意外死亡宛逗,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,254評(píng)論 2 382
  • 文/潘曉璐 我一進(jìn)店門盾剩,熙熙樓的掌柜王于貴愁眉苦臉地迎上來雷激,“玉大人,你說我怎么就攤上這事告私∈合荆” “怎么了?”我有些...
    開封第一講書人閱讀 152,445評(píng)論 0 341
  • 文/不壞的土叔 我叫張陵驻粟,是天一觀的道長(zhǎng)恭垦。 經(jīng)常有香客問我,道長(zhǎng)格嗅,這世上最難降的妖魔是什么番挺? 我笑而不...
    開封第一講書人閱讀 55,185評(píng)論 1 278
  • 正文 為了忘掉前任,我火速辦了婚禮屯掖,結(jié)果婚禮上玄柏,老公的妹妹穿的比我還像新娘。我一直安慰自己贴铜,他們只是感情好粪摘,可當(dāng)我...
    茶點(diǎn)故事閱讀 64,178評(píng)論 5 371
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著绍坝,像睡著了一般徘意。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上轩褐,一...
    開封第一講書人閱讀 48,970評(píng)論 1 284
  • 那天椎咧,我揣著相機(jī)與錄音,去河邊找鬼。 笑死勤讽,一個(gè)胖子當(dāng)著我的面吹牛蟋座,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播脚牍,決...
    沈念sama閱讀 38,276評(píng)論 3 399
  • 文/蒼蘭香墨 我猛地睜開眼向臀,長(zhǎng)吁一口氣:“原來是場(chǎng)噩夢(mèng)啊……” “哼!你這毒婦竟也來了诸狭?” 一聲冷哼從身側(cè)響起券膀,我...
    開封第一講書人閱讀 36,927評(píng)論 0 259
  • 序言:老撾萬榮一對(duì)情侶失蹤,失蹤者是張志新(化名)和其女友劉穎驯遇,沒想到半個(gè)月后三娩,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 43,400評(píng)論 1 300
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡妹懒,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 35,883評(píng)論 2 323
  • 正文 我和宋清朗相戀三年雀监,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片眨唬。...
    茶點(diǎn)故事閱讀 37,997評(píng)論 1 333
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡会前,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出匾竿,到底是詐尸還是另有隱情瓦宜,我是刑警寧澤,帶...
    沈念sama閱讀 33,646評(píng)論 4 322
  • 正文 年R本政府宣布岭妖,位于F島的核電站临庇,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏昵慌。R本人自食惡果不足惜假夺,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,213評(píng)論 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望斋攀。 院中可真熱鬧已卷,春花似錦、人聲如沸淳蔼。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,204評(píng)論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)鹉梨。三九已至讳癌,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間存皂,已是汗流浹背晌坤。 一陣腳步聲響...
    開封第一講書人閱讀 31,423評(píng)論 1 260
  • 我被黑心中介騙來泰國(guó)打工, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人泡仗。 一個(gè)月前我還...
    沈念sama閱讀 45,423評(píng)論 2 352
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像猜憎,于是被迫代替她去往敵國(guó)和親娩怎。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 42,722評(píng)論 2 345

推薦閱讀更多精彩內(nèi)容