推理和驗證
在訓練神經(jīng)網(wǎng)絡之后,可以使用它來進行預測瞒大。這個過程通常稱為推理過程螃征,這一術語來自統(tǒng)計學。然而糠赦,神經(jīng)網(wǎng)絡在面對訓練數(shù)據(jù)時往往表現(xiàn)得太過優(yōu)異会傲,因而無法泛化到未見過的數(shù)據(jù)。這稱之為過擬合拙泽,會影響推理效果淌山。為了在訓練中測試過擬合情況,我們會使用非訓練集中的數(shù)據(jù)(稱為驗證集)衡量效果顾瞻。在訓練期間監(jiān)測驗證效果時泼疑,我們使用正則化避免過擬合。
測試集包含和訓練集相似的圖像荷荤。通常退渗,我們會將原始數(shù)據(jù)集的 10-20% 作為測試和驗證集移稳,剩下的用于訓練。
驗證的目的是衡量模型在非訓練集數(shù)據(jù)上的效果会油。效果標準由開發(fā)者自己決定个粱。通常用準確率表示,即網(wǎng)絡預測正確的類別所占百分比翻翩。其他標準包括精確率和召回率以及top-5 錯誤率都许。我們將側重于準確率。首先嫂冻,將使用測試集中的一批數(shù)據(jù)進行前向傳播胶征。
一、 過擬合
如果我們觀察訓練過程中的訓練和驗證損失桨仿,就能發(fā)現(xiàn)一種叫做過擬合的現(xiàn)象睛低。
網(wǎng)絡能越來越好地學習訓練集中的規(guī)律,導致訓練損失越來越低服傍。但是钱雷,它在泛化到訓練集之外的數(shù)據(jù)時開始出現(xiàn)問題,導致驗證損失上升吹零。任何深度學習模型的最終目標是對新數(shù)據(jù)進行預測急波,因此我們要盡量降低驗證損失。一種方法是使用驗證損失最低
的模型瘪校,在此例中是訓練周期約為 8-10 次的模型澄暮。這種策略稱為早停法 (early stopping)。在實踐中阱扬,你需要在訓練時頻繁地保存模型泣懊,以便之后選擇驗證損失最低的模型
。
最常用的減少過擬合方法(早停法除外)是丟棄麻惶,即隨機丟棄輸入單元馍刮。這樣就促使網(wǎng)絡在權重之間共享信息,使其更能泛化到新數(shù)據(jù)窃蹋。在 PyTorch 中添加丟棄層很簡單卡啰,使用 nn.Dropout
模塊即可。
The network learns the training set better and better, resulting in lower training losses. However, it starts having problems generalizing to data outside the training set leading to the validation loss increasing. The ultimate goal of any deep learning model is to make predictions on new data, so we should strive to get the lowest validation loss possible. One option is to use the version of the model with the lowest validation loss, here the one around 8-10 training epochs. This strategy is called early-stopping. In practice, you'd save the model frequently as you're training then later choose the model with the lowest validation loss.
The most common method to reduce overfitting (outside of early-stopping) is dropout, where we randomly drop input units. This forces the network to share information between weights, increasing it's ability to generalize to new data. Adding dropout in PyTorch is straightforward using the nn.Dropout
module.
在訓練過程中警没,我們需要使用丟棄防止過擬合匈辱,但是在推理過程中,我們需要使用整個網(wǎng)絡杀迹。因此在驗證亡脸、測試和使用網(wǎng)絡進行預測時,我們需要關閉丟棄功能。你可以使用 model.eval()
浅碾。它會將模型設為驗證模式大州,使丟棄率變成 0。也可以使用 model.train()
垂谢,將模型設為訓練模式厦画,重新開啟丟棄功能。通常滥朱,驗證循環(huán)的規(guī)律將為:關閉梯度苛白,將模型設為評估模式,計算驗證損失和指標焚虱,然后將模型重新設為訓練模式。
# turn off gradients
with torch.no_grad():
# set model to evaluation mode
model.eval()
# validation pass here
for images, labels in testloader:
...
# set model back to train mode
model.train()
二懂版、 推理
訓練好模型后鹃栽,我們可以用它推理了。之前已經(jīng)進行過這一步驟躯畴,但是現(xiàn)在需要使用 model.eval()
將模型設為推理模式民鼓。對于 torch.no_grad()
,你需要關閉 autograd蓬抄。
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import matplotlib.pyplot as plt
import torch
from torchvision import datasets, transforms
# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))])
# Download and load the training data
trainset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
# Download and load the test data
testset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=False, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)
from torch import nn, optim
import torch.nn.functional as F
## Define your model with dropout added
class Classifier(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 256)
self.fc2 = nn.Linear(256, 128)
self.fc3 = nn.Linear(128, 64)
self.fc4 = nn.Linear(64, 10)
#Dropout module with 0.2 drop probability
self.dropout = nn.Dropout(p = 0.2)
def forward(self, x):
x = x.view(x.shape[0], -1)
#with dropout
x = self.dropout(F.relu(self.fc1(x)))
x = self.dropout(F.relu(self.fc2(x)))
x = self.dropout(F.relu(self.fc3(x)))
x = F.log_softmax(self.fc4(x), dim=1)
return x
##Train your model with dropout, and monitor the training progress with the validation loss and accuracy
model = Classifier()
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.003)
epochs = 30
steps = 0
train_losses, test_losses = [],[]
for e in range(epochs):
running_loss = 0
model.train()
for images, labels in trainloader:
optimizer.zero_grad()
log_ps = model(images)
loss = criterion(log_ps, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
else:
test_loss = 0
accuracy = 0
with torch.no_grad():
model.eval()
for images, labels in testloader:
log_ps = model(images)
loss = criterion(log_ps, labels)
test_loss += loss
ps = torch.exp(log_ps)
top_p, top_class = ps.topk(1, dim=1)
equals = top_class == labels.view(*top_class.shape)
accuracy += torch.mean(equals.type(torch.FloatTensor))
train_losses.append(running_loss/len(trainloader))
test_losses.append(test_loss/len(testloader))
print("Epoch: {}/{}..".format(e+1, epochs),
"Training Loss: {:.3f}..".format(running_loss/len(trainloader)),
"Test Loss: {:.3f}..".format(test_loss/len(testloader)),
"Test Accuracy: {:.3f}".format(accuracy/len(testloader))
)
plt.plot(train_losses, label='Training loss')
plt.plot(test_losses, label='Validation loss')
plt.legend(frameon=False)
# Import helper module (should be in the repo)
import helper
# Test out your network!
model.eval()
dataiter = iter(testloader)
images, labels = dataiter.next()
img = images[0]
# Convert 2D image to 1D vector
img = img.view(1, 784)
# Calculate the class probabilities (softmax) for img
with torch.no_grad():
output = model.forward(img)
ps = torch.exp(output)
# Plot the image and probabilities
helper.view_classify(img.view(1, 28, 28), ps, version='Fashion')