深度學(xué)習(xí)筆記（七）—— CNN-1

實(shí)驗(yàn)準(zhǔn)備

熟悉python語言的使用和numpy,torch的基本用法
熟悉神經(jīng)網(wǎng)絡(luò)的訓(xùn)練過程與優(yōu)化方法
結(jié)合理論課的內(nèi)容,了解卷積與卷積神經(jīng)網(wǎng)絡(luò)(CNN)的內(nèi)容和原理
了解常用的CNN模型的基本結(jié)構(gòu),如AlexNet,Vgg,ResNet

實(shí)驗(yàn)過程

1. 卷積與卷積層

numpy實(shí)現(xiàn)卷積
pytorch中的卷積層和池化層

2. CNN

實(shí)現(xiàn)并訓(xùn)練一個(gè)基本的CNN網(wǎng)絡(luò)
ResNet
VGG

卷積

conv

在實(shí)驗(yàn)課上我們已經(jīng)了解過卷積運(yùn)算的操作當(dāng)我們對(duì)一張二維的圖像做卷積時(shí),將卷積核沿著圖像進(jìn)行滑動(dòng)乘加即可(如上圖所示).

下面的conv函數(shù)實(shí)現(xiàn)了對(duì)二維單通道圖像的卷積.考慮輸入的卷積核kernel的長(zhǎng)寬相同,padding為對(duì)圖像的四個(gè)邊緣補(bǔ)0,stride為卷積核窗口滑動(dòng)的步長(zhǎng).

import numpy as np

def convolution(img, kernel, padding=1, stride=1):
    """
    img: input image with one channel
    kernel: convolution kernel
    """
    
    h, w = img.shape
    kernel_size = kernel.shape[0]
    
    # height and width of image with padding 
    ph, pw = h + 2 * padding, w + 2 * padding
    padding_img = np.zeros((ph, pw))
    padding_img[padding:h + padding, padding:w + padding] = img
    
    # height and width of output image
    result_h = (h + 2 * padding - kernel_size) // stride + 1
    result_w = (w + 2 * padding - kernel_size) // stride + 1
    
    result = np.zeros((result_h, result_w))
    
    # convolution
    x, y = 0, 0
    for i in range(0, ph - kernel_size + 1, stride):
        for j in range(0, pw - kernel_size + 1, stride):
            roi = padding_img[i:i+kernel_size, j:j+kernel_size]
            result[x, y] = np.sum(roi * kernel)
            y += 1
        y = 0
        x += 1
    return result

下面在圖像上簡(jiǎn)單一下測(cè)試我們的conv函數(shù),這里使用3*3的高斯核對(duì)下面的圖像進(jìn)行濾波.

image

from PIL import Image
import matplotlib.pyplot as plt
img = Image.open('pics/lena.jpg').convert('L')
plt.imshow(img, cmap='gray')

#  a Laplace kernel
laplace_kernel = np.array([[-1, -1, -1],
                           [-1, 8, -1],
                           [-1, -1, -1]])

# Gauss kernel with kernel_size=3
gauss_kernel3 = (1/ 16) * np.array([[1, 2, 1], 
                                   [2, 4, 2], 
                                   [1, 2, 1]])

# Gauss kernel with kernel_size=5
gauss_kernel5 = (1/ 84) * np.array([[1, 2, 3, 2, 1],
                                    [2, 5, 6, 5, 2], 
                                    [3, 6, 8, 6, 3],
                                    [2, 5, 6, 5, 2],
                                    [1, 2, 3, 2, 1]])

fig, ax = plt.subplots(1, 3, figsize=(12, 8))

laplace_img = convolution(np.array(img), laplace_kernel, padding=1, stride=1)
ax[0].imshow(Image.fromarray(laplace_img), cmap='gray')
ax[0].set_title('laplace')

gauss3_img = convolution(np.array(img), gauss_kernel3, padding=1, stride=1)
ax[1].imshow(Image.fromarray(gauss3_img), cmap='gray')
ax[1].set_title('gauss kernel_size=3')

gauss5_img = convolution(np.array(img), gauss_kernel5, padding=2, stride=1)
ax[2].imshow(Image.fromarray(gauss5_img), cmap='gray')
ax[2].set_title('gauss kernel_size=5')

Text(0.5,1,'gauss kernel_size=5')

image

上面我們實(shí)現(xiàn)了實(shí)現(xiàn)了對(duì)單通道輸入單通道輸出的卷積.在CNN中,一般使用到的都是多通道輸入多通道輸出的卷積,要實(shí)現(xiàn)多通道的卷積, 我們只需要對(duì)循環(huán)調(diào)用上面的conv函數(shù)即可.

def myconv2d(features, weights,  padding=0, stride=1):
    """
    features: input, in_channel * h * w
    weights: kernel, out_channel * in_channel * kernel_size * kernel_size
    return output with out_channel
    """
    in_channel, h, w = features.shape
    out_channel, _, kernel_size, _ = weights.shape
    
    # height and width of output image
    output_h = (h + 2 * padding - kernel_size) // stride + 1
    output_w = (w + 2 * padding - kernel_size) // stride + 1
    output = np.zeros((out_channel, output_h, output_w))
    
    # call convolution out_channel * in_channel times
    for i in range(out_channel):
        weight = weights[i]
        for j in range(in_channel):
            feature_map = features[j]
            kernel = weight[j]
            output[i] += convolution(feature_map, kernel, padding, stride)
    return output

接下來, 讓我們測(cè)試我們寫好的myconv2d函數(shù).

input_data=[
           [[0,0,2,2,0,1],
            [0,2,2,0,0,2],
            [1,1,0,2,0,0],
            [2,2,1,1,0,0],
            [2,0,1,2,0,1],
            [2,0,2,1,0,1]],

           [[2,0,2,1,1,1],
            [0,1,0,0,2,2],
            [1,0,0,2,1,0],
            [1,1,1,1,1,1],
            [1,0,1,1,1,2],
            [2,1,2,1,0,2]]
            ]
weights_data=[[ 
               [[ 0, 1, 0],
                [ 1, 1, 1],
                [ 0, 1, 0]],
    
               [[-1, -1, -1],
                [ -1, 8, -1],
                [ -1, -1, -1]] 
           ]]

# numpy array
input_data   = np.array(input_data)
weights_data = np.array(weights_data)

# show the result
print(myconv2d(input_data, weights_data, padding=3, stride=3))

[[[  0.   0.   0.   0.]
  [  0.   8.  10.   0.]
  [  0.  -5.   2.   0.]
  [  0.   0.   0.   0.]]]

在Pytorch中,已經(jīng)為我們提供了卷積和卷積層的實(shí)現(xiàn).使用同樣的input和weights,以及stride,padding,pytorch的卷積的結(jié)果應(yīng)該和我們的一樣.可以在下面的代碼中進(jìn)行驗(yàn)證.

import torch
import torch.nn.functional as F
input_tensor = torch.tensor(input_data).unsqueeze(0).float()
F.conv2d(input_tensor, weight=torch.tensor(weights_data).float(), bias=None, stride=3, padding=3)

tensor([[[[ 0.,  0.,  0.,  0.],
          [ 0.,  8., 10.,  0.],
          [ 0., -5.,  2.,  0.],
          [ 0.,  0.,  0.,  0.]]]])

作業(yè):

上述代碼中convolution的實(shí)現(xiàn)只考慮卷積核以及padding和stride長(zhǎng)寬一致的情況,若輸入的卷積核可能長(zhǎng)寬不一致,padding與stride的輸入可能為兩個(gè)元素的元祖(代表兩個(gè)維度上的padding與stride)并使用下面test input對(duì)你的convolutionV2進(jìn)行測(cè)試.

def convolutionV2(img, kernel, padding=(0,0), stride=(1,1)):
    """
    img: input image with one channel
    kernel: convolution kernel
    """
    
    h, w = img.shape
    kernel_size_h, kernel_size_w = kernel.shape
    padding_h, padding_w = padding[0], padding[1]
    stride_h, stride_w = stride[0], stride[1]
    
    # height and width of image with padding 
    ph, pw = h + 2 * padding_h, w + 2 * padding_w
    padding_img = np.zeros((ph, pw))
    padding_img[padding_h:h + padding_h, padding_w:w + padding_w] = img
    
    # height and width of output image
    result_h = (h + 2 * padding_h - kernel_size_h) // stride_h + 1
    result_w = (w + 2 * padding_w - kernel_size_w) // stride_w + 1
    
    result = np.zeros((result_h, result_w))

    # convolution
    x, y = 0, 0
    for i in range(0, ph - kernel_size_h + 1, stride_h):
        for j in range(0, pw - kernel_size_w + 1, stride_w):
            roi = padding_img[i:i+kernel_size_h, j:j+kernel_size_w]
            result[x, y] = np.sum(roi * kernel)
            y += 1
        y = 0
        x += 1
    return result

# test input
test_input = np.array([[1, 1, 2, 1],
                       [0, 1, 0, 2],
                       [2, 2, 0, 2],
                       [2, 2, 2, 1],
                       [2, 3, 2, 3]])

test_kernel = np.array([[1, 0], [0, 1], [0, 0]])

# output
print(convolutionV2(test_input, test_kernel, padding=(1, 0), stride=(1, 1)))
print('\n')
print(convolutionV2(test_input, test_kernel, padding=(2, 1), stride=(1, 2)))
print('\n')

[[ 1.  2.  1.]
 [ 2.  1.  4.]
 [ 2.  1.  2.]
 [ 4.  4.  1.]
 [ 5.  4.  5.]]

?
? [[ 0. 0. 0.]
? [ 1. 2. 0.]
? [ 0. 1. 1.]
? [ 2. 1. 2.]
? [ 2. 4. 2.]
? [ 2. 4. 1.]
? [ 0. 3. 3.]]

卷積層

Pytorch提供了卷積層和池化層供我們使用.

卷積層與上面相似, 而池化層與卷積層相似,Pooling layer的主要目的是縮小features的size.常用的有MaxPool(滑動(dòng)窗口取最大值)與AvgPool(滑動(dòng)窗口取均值)

import torch
import torch.nn as nn


x = torch.randn(1, 1, 32, 32)

conv_layer = nn.Conv2d(in_channels=1, out_channels=3, kernel_size=3, stride=1, padding=0)
y = conv_layer(x)
print(x.shape)
print(y.shape)

torch.Size([1, 1, 32, 32])
torch.Size([1, 3, 30, 30])

請(qǐng)問:

輸入與輸出的tensor的size分別是多少?該卷積層的參數(shù)量是多少?
若kernel_size=5,stride=2,padding=2, 輸出的tensor的size是多少?在上述代碼中改變參數(shù)后試驗(yàn)后并回答.
若輸入的tensor size為N*C*H*W,若第5行中卷積層的參數(shù)為in_channels=C,out_channels=Cout,kernel_size=k,stride=s,padding=p,那么輸出的tensor size是多少?

import torch
import torch.nn as nn


x = torch.randn(1, 1, 32, 32)

conv_layer = nn.Conv2d(in_channels=1, out_channels=3, kernel_size=5, stride=2, padding=2)
y = conv_layer(x)
print(x.shape)
print(y.shape)

torch.Size([1, 1, 32, 32])
torch.Size([1, 3, 16, 16])

答:

${size}_{in}=$ 32; ${size}_{out}=$ 30; $F×F×C_{input}×K+K=3*3*1*3+3=30$ Ref.
${size}_{out}=$ 16.
$min( (h+2*p-k)//s+1, (w+2*p-k)//s+1 )$

# input N * C * H * W
x = torch.randn(1, 1, 4, 4)

# maxpool
maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
y = maxpool(x)

# avgpool
avgpool = nn.AvgPool2d(kernel_size=2, stride=2)
z = avgpool(x)

#avgpool
print(x)
print(y)
print(z)

tensor([[[[-0.7988, -0.6036,  1.0944,  1.0869],
          [ 1.1715, -1.8142, -0.5802,  1.5753],
          [ 1.3232,  0.6413, -0.5604,  0.9052],
          [-0.3123,  1.1715,  0.0411, -0.0606]]]])
tensor([[[[1.1715, 1.5753],
          [1.3232, 0.9052]]]])
tensor([[[[-0.5113,  0.7941],
          [ 0.7059,  0.0813]]]])

GPU

我們可以選擇在cpu或gpu上來訓(xùn)練我們的模型.
實(shí)驗(yàn)室提供了4卡的gpu服務(wù)器,要查看各個(gè)gpu設(shè)備的使用情況,可以在服務(wù)器上的jupyter主頁(yè)點(diǎn)擊new->terminal,在terminal中輸入nvidia-smi即可查看每張卡的使用情況.如下圖.

image

上圖左邊一欄顯示了他們的設(shè)備id(0,1,2,3),風(fēng)扇轉(zhuǎn)速,溫度,性能狀態(tài),能耗等信息,中間一欄顯示他們的bus-id和顯存使用量,右邊一欄是GPU使用率等信息.注意到中間一欄的顯存使用量,在訓(xùn)練模型前我們可以根據(jù)空余的顯存來選擇我們使用的gpu設(shè)備.
在本次實(shí)驗(yàn)中我們將代碼中的torch.device('cuda:0')的0更換成所需的設(shè)備id即可選擇在相應(yīng)的gpu設(shè)備上運(yùn)行程序.

CNN(卷積神經(jīng)網(wǎng)絡(luò))

一個(gè)簡(jiǎn)單的CNN

接下來,讓我們建立一個(gè)簡(jiǎn)單的CNN分類器.
這個(gè)CNN的整體流程是
卷積(Conv2d) -> BN(batch normalization) -> 激勵(lì)函數(shù)(ReLU) -> 池化(MaxPooling) ->
卷積(Conv2d) -> BN(batch normalization) -> 激勵(lì)函數(shù)(ReLU) -> 池化(MaxPooling) ->
全連接層(Linear) -> 輸出.

import torch
import torch.nn as nn
import torch.utils.data as Data
import torchvision


class MyCNN(nn.Module):
    
    def __init__(self, image_size, num_classes):
        super(MyCNN, self).__init__()
        # conv1: Conv2d -> BN -> ReLU -> MaxPool
        self.conv1 = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(16),
            nn.ReLU(), 
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        # conv2: Conv2d -> BN -> ReLU -> MaxPool
        self.conv2 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        # fully connected layer
        self.fc = nn.Linear(32 * (image_size // 4) * (image_size // 4), num_classes)
        

    def forward(self, x):
        """
        input: N * 3 * image_size * image_size
        output: N * num_classes
        """
        x = self.conv1(x)
        x = self.conv2(x)
        # view(x.size(0), -1): change tensor size from (N ,H , W) to (N, H*W)
        x = x.view(x.size(0), -1)
        output = self.fc(x)
        return output

這樣,一個(gè)簡(jiǎn)單的CNN模型就寫好了.與前面的課堂內(nèi)容相似,我們需要對(duì)完成網(wǎng)絡(luò)進(jìn)行訓(xùn)練與評(píng)估的代碼.

def train(model, train_loader, loss_func, optimizer, device):
    """
    train model using loss_fn and optimizer in an epoch.
    model: CNN networks
    train_loader: a Dataloader object with training data
    loss_func: loss function
    device: train on cpu or gpu device
    """
    total_loss = 0
    # train the model using minibatch
    for i, (images, targets) in enumerate(train_loader):
        images = images.to(device)
        targets = targets.to(device)

        # forward
        outputs = model(images)
        loss = loss_func(outputs, targets)

        # backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
        
        # every 100 iteration, print loss
        if (i + 1) % 100 == 0:
            print ("Step [{}/{}] Train Loss: {:.4f}"
                   .format(i+1, len(train_loader), loss.item()))
    return total_loss / len(train_loader)

def evaluate(model, val_loader, device):
    """
    model: CNN networks
    val_loader: a Dataloader object with validation data
    device: evaluate on cpu or gpu device
    return classification accuracy of the model on val dataset
    """
    # evaluate the model
    model.eval()
    # context-manager that disabled gradient computation
    with torch.no_grad():
        correct = 0
        total = 0
        
        for i, (images, targets) in enumerate(val_loader):
            # device: cpu or gpu
            images = images.to(device)
            targets = targets.to(device)
            
            
            outputs = model(images)
            
            # return the maximum value of each row of the input tensor in the 
            # given dimension dim, the second return vale is the index location
            # of each maxium value found(argmax)
            _, predicted = torch.max(outputs.data, dim=1)
            
            
            correct += (predicted == targets).sum().item()
            
            total += targets.size(0)
            
        accuracy = correct / total
        print('Accuracy on Test Set: {:.4f} %'.format(100 * accuracy))
        return accuracy

def save_model(model, save_path):
    # save model
    torch.save(model.state_dict(), save_path)

import matplotlib.pyplot as plt
def show_curve(ys, title):
    """
    plot curlve for Loss and Accuacy
    Args:
        ys: loss or acc list
        title: loss or accuracy
    """
    x = np.array(range(len(ys)))
    y = np.array(ys)
    plt.plot(x, y, c='b')
    plt.axis()
    plt.title('{} curve'.format(title))
    plt.xlabel('epoch')
    plt.ylabel('{}'.format(title))
    plt.show()

準(zhǔn)備數(shù)據(jù)與訓(xùn)練模型

接下來浊洞，我們使用CIFAR10數(shù)據(jù)集來對(duì)我們的CNN模型進(jìn)行訓(xùn)練.

CIFAR-10:該數(shù)據(jù)集共有60000張彩色圖像,這些圖像是32*32,分為10個(gè)類,每類6000張圖.這里面有50000張用于訓(xùn)練,構(gòu)成了5個(gè)訓(xùn)練批,每一批10000張圖;另外10000用于測(cè)試,單獨(dú)構(gòu)成一批.在本次實(shí)驗(yàn)中,使用CIFAR-10數(shù)據(jù)集來訓(xùn)練我們的模型.我們可以用torchvision.datasets.CIFAR10來直接使用CIFAR10數(shù)據(jù)集.

image

import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms

# mean and std of cifar10 in 3 channels 
cifar10_mean = (0.49, 0.48, 0.45)
cifar10_std = (0.25, 0.24, 0.26)

# define transform operations of train dataset 
train_transform = transforms.Compose([
    # data augmentation
    transforms.Pad(4),
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32),

    transforms.ToTensor(),
    transforms.Normalize(cifar10_mean, cifar10_std)])

test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(cifar10_mean, cifar10_std)])

# torchvision.datasets provide CIFAR-10 dataset for classification
train_dataset = torchvision.datasets.CIFAR10(root='./data/',
                                             train=True, 
                                             transform=train_transform,
                                             download=True)

test_dataset = torchvision.datasets.CIFAR10(root='./data/',
                                            train=False, 
                                            transform=test_transform)

# Data loader: provides single- or multi-process iterators over the dataset.
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=100, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=100, 
                                          shuffle=False)

Files already downloaded and verified

訓(xùn)練過程中使用交叉熵(cross-entropy)損失函數(shù)與Adam優(yōu)化器來訓(xùn)練我們的分類器網(wǎng)絡(luò).
閱讀下面的代碼并在To-Do處,根據(jù)之前所學(xué)的知識(shí),補(bǔ)充前向傳播和反向傳播的代碼來實(shí)現(xiàn)分類網(wǎng)絡(luò)的訓(xùn)練.

def fit(model, num_epochs, optimizer, device):
    """
    train and evaluate an classifier num_epochs times.
    We use optimizer and cross entropy loss to train the model. 
    Args: 
        model: CNN network
        num_epochs: the number of training epochs
        optimizer: optimize the loss function
    """
        
    # loss and optimizer
    loss_func = nn.CrossEntropyLoss()
    
    model.to(device)
    loss_func.to(device)
    
    # log train loss and test accuracy
    losses = []
    accs = []
    
    for epoch in range(num_epochs):
        
        print('Epoch {}/{}:'.format(epoch + 1, num_epochs))
        # train step
        loss = train(model, train_loader, loss_func, optimizer, device)
        losses.append(loss)
        
        # evaluate step
        accuracy = evaluate(model, test_loader, device)
        accs.append(accuracy)
        
    
    # show curve
    show_curve(losses, "train loss")
    show_curve(accs, "test accuracy")

# hyper parameters
num_epochs = 10
lr = 0.01
image_size = 32
num_classes = 10

# declare and define an objet of MyCNN
mycnn = MyCNN(image_size, num_classes)
print(mycnn)

MyCNN(
  (conv1): Sequential(
    (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv2): Sequential(
    (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc): Linear(in_features=2048, out_features=10, bias=True)
)

# Device configuration, cpu, cuda:0/1/2/3 available
device = torch.device('cuda:0')

optimizer = torch.optim.Adam(mycnn.parameters(), lr=lr)

# start training on cifar10 dataset
fit(mycnn, num_epochs, optimizer, device)

Epoch 1/10:
Step [100/500] Train Loss: 1.8075
Step [200/500] Train Loss: 1.6811
Step [300/500] Train Loss: 1.6177
Step [400/500] Train Loss: 1.3389
Step [500/500] Train Loss: 1.2736
Accuracy on Test Set: 53.9500 %
Epoch 2/10:
Step [100/500] Train Loss: 1.5978
Step [200/500] Train Loss: 1.2951
Step [300/500] Train Loss: 1.3162
Step [400/500] Train Loss: 1.2874
Step [500/500] Train Loss: 1.1236
Accuracy on Test Set: 61.5300 %
Epoch 3/10:
Step [100/500] Train Loss: 1.3468
Step [200/500] Train Loss: 1.3069
Step [300/500] Train Loss: 1.1912
Step [400/500] Train Loss: 1.2451
Step [500/500] Train Loss: 1.3067
Accuracy on Test Set: 60.2800 %
Epoch 4/10:
Step [100/500] Train Loss: 1.3471
Step [200/500] Train Loss: 1.2564
Step [300/500] Train Loss: 1.1971
Step [400/500] Train Loss: 1.1134
Step [500/500] Train Loss: 1.3163
Accuracy on Test Set: 62.7700 %
Epoch 5/10:
Step [100/500] Train Loss: 1.2081
Step [200/500] Train Loss: 1.0366
Step [300/500] Train Loss: 1.0514
Step [400/500] Train Loss: 1.1292
Step [500/500] Train Loss: 1.0381
Accuracy on Test Set: 64.4700 %
Epoch 6/10:
Step [100/500] Train Loss: 0.9613
Step [200/500] Train Loss: 0.9588
Step [300/500] Train Loss: 1.1643
Step [400/500] Train Loss: 0.9842
Step [500/500] Train Loss: 1.0876
Accuracy on Test Set: 64.2500 %
Epoch 7/10:
Step [100/500] Train Loss: 1.1227
Step [200/500] Train Loss: 1.1365
Step [300/500] Train Loss: 1.2146
Step [400/500] Train Loss: 1.0229
Step [500/500] Train Loss: 1.3981
Accuracy on Test Set: 65.6000 %
Epoch 8/10:
Step [100/500] Train Loss: 1.1427
Step [200/500] Train Loss: 0.9221
Step [300/500] Train Loss: 1.1509
Step [400/500] Train Loss: 0.9516
Step [500/500] Train Loss: 1.1159
Accuracy on Test Set: 65.5400 %
Epoch 9/10:
Step [100/500] Train Loss: 1.0614
Step [200/500] Train Loss: 1.0258
Step [300/500] Train Loss: 0.9749
Step [400/500] Train Loss: 0.9400
Step [500/500] Train Loss: 1.2101
Accuracy on Test Set: 66.7200 %
Epoch 10/10:
Step [100/500] Train Loss: 1.2158
Step [200/500] Train Loss: 1.1549
Step [300/500] Train Loss: 0.9802
Step [400/500] Train Loss: 0.9733
Step [500/500] Train Loss: 1.0673
Accuracy on Test Set: 66.6800 %

image

ResNet

接下來,讓我們完成更復(fù)雜的CNN的實(shí)現(xiàn).
ResNet又叫做殘差網(wǎng)絡(luò).在ResNet網(wǎng)絡(luò)結(jié)構(gòu)中會(huì)用到兩種殘差模塊，一種是以兩個(gè)3*3的卷積網(wǎng)絡(luò)串接在一起作為一個(gè)殘差模塊骨宠，另外一種是1*1、3*3剩檀、1*1的3個(gè)卷積網(wǎng)絡(luò)串接在一起作為一個(gè)殘差模塊座每。他們?nèi)缦聢D所示。

image

我們以左邊的模塊為例實(shí)現(xiàn)一個(gè)ResidualBlock.注意到由于我們?cè)趦纱尉矸e中可能會(huì)使輸入的tensor的size與輸出的tensor的size不相等,為了使它們能夠相加,所以輸出的tensor與輸入的tensor size不同時(shí),我們使用downsample(由外部傳入)來使保持size相同

現(xiàn)在,試在To-Do補(bǔ)充代碼完成下面的forward函數(shù)來完成ResidualBlock的實(shí)現(xiàn),并運(yùn)行它.

# 3x3 convolution
def conv3x3(in_channels, out_channels, stride=1):
    return nn.Conv2d(in_channels, out_channels, kernel_size=3, 
                     stride=stride, padding=1, bias=False)

# Residual block
class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(ResidualBlock, self).__init__()
        self.conv1 = conv3x3(in_channels, out_channels, stride)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(out_channels, out_channels)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsample = downsample
        
    def forward(self, x):
        """
        Defines the computation performed at every call.
        x: N * C * H * W
        """
        residual = x
        # if the size of input x changes, using downsample to change the size of residual
        if self.downsample:
            residual = self.downsample(x)
        out = self.conv1(x)
        out = self.bn1(out)
        
        """
        To-Do: add code here
        """
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        
        out += residual
        out = self.relu(out)
        return out

下面是一份針對(duì)cifar10數(shù)據(jù)集的ResNet的實(shí)現(xiàn).它先通過一個(gè)conv3x3,然后經(jīng)過3個(gè)包含多個(gè)殘差模塊的layer(一個(gè)layer可能包括多個(gè)ResidualBlock, 由傳入的layers列表中的數(shù)字決定), 然后經(jīng)過一個(gè)全局平均池化層,最后通過一個(gè)線性層.

class ResNet(nn.Module):
    def __init__(self, block, layers, num_classes=10):
        """
        block: ResidualBlock or other block
        layers: a list with 3 positive num.
        """
        super(ResNet, self).__init__()
        self.in_channels = 16
        self.conv = conv3x3(3, 16)
        self.bn = nn.BatchNorm2d(16)
        self.relu = nn.ReLU(inplace=True)
        # layer1: image size 32
        self.layer1 = self.make_layer(block, 16, num_blocks=layers[0])
        # layer2: image size 32 -> 16
        self.layer2 = self.make_layer(block, 32, num_blocks=layers[1], stride=2)
        # layer1: image size 16 -> 8
        self.layer3 = self.make_layer(block, 64, num_blocks=layers[2], stride=2)
        # global avg pool: image size 8 -> 1
        self.avg_pool = nn.AvgPool2d(8)
    
        self.fc = nn.Linear(64, num_classes)
        
    def make_layer(self, block, out_channels, num_blocks, stride=1):
        """
        make a layer with num_blocks blocks.
        """
        
        downsample = None
        if (stride != 1) or (self.in_channels != out_channels):
            # use Conv2d with stride to downsample
            downsample = nn.Sequential(
                conv3x3(self.in_channels, out_channels, stride=stride),
                nn.BatchNorm2d(out_channels))
        
        # first block with downsample
        layers = []
        layers.append(block(self.in_channels, out_channels, stride, downsample))
        
        self.in_channels = out_channels
        # add num_blocks - 1 blocks
        for i in range(1, num_blocks):
            layers.append(block(out_channels, out_channels))
            
        # return a layer containing layers
        return nn.Sequential(*layers)
    
    def forward(self, x):
        out = self.conv(x)
        out = self.bn(out)
        out = self.relu(out)
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.avg_pool(out)
        # view: here change output size from 4 dimensions to 2 dimensions
        out = out.view(out.size(0), -1)
        out = self.fc(out)
        return out

resnet = ResNet(ResidualBlock, [2, 2, 2])
print(resnet)

ResNet(
  (conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace)
  (layer1): Sequential(
    (0): ResidualBlock(
      (conv1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): ResidualBlock(
      (conv1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer2): Sequential(
    (0): ResidualBlock(
      (conv1): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): ResidualBlock(
      (conv1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer3): Sequential(
    (0): ResidualBlock(
      (conv1): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): ResidualBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (avg_pool): AvgPool2d(kernel_size=8, stride=8, padding=0)
  (fc): Linear(in_features=64, out_features=10, bias=True)
)

使用fit函數(shù)訓(xùn)練實(shí)現(xiàn)的ResNet,觀察結(jié)果變化.

# Hyper-parameters
num_epochs = 10
lr = 0.001
# Device configuration
device = torch.device('cuda:0')
# optimizer
optimizer = torch.optim.Adam(resnet.parameters(), lr=lr)

fit(resnet, num_epochs, optimizer, device)

Epoch 1/10:
Step [100/500] Train Loss: 1.0425
Step [200/500] Train Loss: 1.2821
Step [300/500] Train Loss: 1.0189
Step [400/500] Train Loss: 1.0343
Step [500/500] Train Loss: 1.0760
Accuracy on Test Set: 63.9400 %
Epoch 2/10:
Step [100/500] Train Loss: 0.9691
Step [200/500] Train Loss: 0.9280
Step [300/500] Train Loss: 1.1253
Step [400/500] Train Loss: 1.0832
Step [500/500] Train Loss: 0.7534
Accuracy on Test Set: 63.9400 %
Epoch 3/10:
Step [100/500] Train Loss: 0.9576
Step [200/500] Train Loss: 0.8765
Step [300/500] Train Loss: 0.7416
Step [400/500] Train Loss: 0.8020
Step [500/500] Train Loss: 0.7128
Accuracy on Test Set: 68.0000 %
Epoch 4/10:
Step [100/500] Train Loss: 1.0099
Step [200/500] Train Loss: 0.9608
Step [300/500] Train Loss: 0.8774
Step [400/500] Train Loss: 0.7870
Step [500/500] Train Loss: 0.7058
Accuracy on Test Set: 68.5800 %
Epoch 5/10:
Step [100/500] Train Loss: 0.8077
Step [200/500] Train Loss: 0.5876
Step [300/500] Train Loss: 0.8926
Step [400/500] Train Loss: 0.8441
Step [500/500] Train Loss: 0.9973
Accuracy on Test Set: 72.6900 %
Epoch 6/10:
Step [100/500] Train Loss: 0.8229
Step [200/500] Train Loss: 0.7058
Step [300/500] Train Loss: 0.7750
Step [400/500] Train Loss: 0.7295
Step [500/500] Train Loss: 0.8246
Accuracy on Test Set: 72.6600 %
Epoch 7/10:
Step [100/500] Train Loss: 0.7068
Step [200/500] Train Loss: 0.6928
Step [300/500] Train Loss: 0.8502
Step [400/500] Train Loss: 0.7325
Step [500/500] Train Loss: 0.6583
Accuracy on Test Set: 75.1100 %
Epoch 8/10:
Step [100/500] Train Loss: 0.6834
Step [200/500] Train Loss: 0.8615
Step [300/500] Train Loss: 0.7363
Step [400/500] Train Loss: 0.8829
Step [500/500] Train Loss: 0.7208
Accuracy on Test Set: 74.1100 %
Epoch 9/10:
Step [100/500] Train Loss: 0.6611
Step [200/500] Train Loss: 0.5346
Step [300/500] Train Loss: 0.4550
Step [400/500] Train Loss: 0.7190
Step [500/500] Train Loss: 0.5672
Accuracy on Test Set: 76.9400 %
Epoch 10/10:
Step [100/500] Train Loss: 0.5207
Step [200/500] Train Loss: 0.6895
Step [300/500] Train Loss: 0.5880
Step [400/500] Train Loss: 0.6893
Step [500/500] Train Loss: 0.7157
Accuracy on Test Set: 77.9500 %

image

作業(yè)

嘗試改變學(xué)習(xí)率lr,使用SGD或Adam優(yōu)化器,訓(xùn)練10個(gè)epoch,提高ResNet在測(cè)試集上的accuracy.

# Hyper-parameters
num_epochs = 10
lr = 0.0015
# Device configuration
device = torch.device('cuda:0')
# optimizer
optimizer = torch.optim.Adam(resnet.parameters(), lr=lr)

fit(resnet, num_epochs, optimizer, device)

Epoch 1/10:
Step [100/500] Train Loss: 0.7118
Step [200/500] Train Loss: 0.4573
Step [300/500] Train Loss: 0.4669
Step [400/500] Train Loss: 0.2568
Step [500/500] Train Loss: 0.4969
Accuracy on Test Set: 80.3800 %
Epoch 2/10:
Step [100/500] Train Loss: 0.4439
Step [200/500] Train Loss: 0.4941
Step [300/500] Train Loss: 0.5434
Step [400/500] Train Loss: 0.4898
Step [500/500] Train Loss: 0.4460
Accuracy on Test Set: 82.1700 %
Epoch 3/10:
Step [100/500] Train Loss: 0.4875
Step [200/500] Train Loss: 0.3971
Step [300/500] Train Loss: 0.5229
Step [400/500] Train Loss: 0.6836
Step [500/500] Train Loss: 0.4133
Accuracy on Test Set: 78.1500 %
Epoch 4/10:
Step [100/500] Train Loss: 0.3835
Step [200/500] Train Loss: 0.5045
Step [300/500] Train Loss: 0.4055
Step [400/500] Train Loss: 0.3561
Step [500/500] Train Loss: 0.4818
Accuracy on Test Set: 83.5100 %
Epoch 5/10:
Step [100/500] Train Loss: 0.3647
Step [200/500] Train Loss: 0.5745
Step [300/500] Train Loss: 0.2970
Step [400/500] Train Loss: 0.4631
Step [500/500] Train Loss: 0.3952
Accuracy on Test Set: 82.9100 %
Epoch 6/10:
Step [100/500] Train Loss: 0.4992
Step [200/500] Train Loss: 0.4990
Step [300/500] Train Loss: 0.4383
Step [400/500] Train Loss: 0.5731
Step [500/500] Train Loss: 0.3213
Accuracy on Test Set: 83.0500 %
Epoch 7/10:
Step [100/500] Train Loss: 0.3208
Step [200/500] Train Loss: 0.3100
Step [300/500] Train Loss: 0.4275
Step [400/500] Train Loss: 0.4537
Step [500/500] Train Loss: 0.4117
Accuracy on Test Set: 83.2300 %
Epoch 8/10:
Step [100/500] Train Loss: 0.4122
Step [200/500] Train Loss: 0.4852
Step [300/500] Train Loss: 0.4390
Step [400/500] Train Loss: 0.3829
Step [500/500] Train Loss: 0.3836
Accuracy on Test Set: 83.1100 %
Epoch 9/10:
Step [100/500] Train Loss: 0.3871
Step [200/500] Train Loss: 0.3587
Step [300/500] Train Loss: 0.2804
Step [400/500] Train Loss: 0.2926
Step [500/500] Train Loss: 0.4059
Accuracy on Test Set: 83.7800 %
Epoch 10/10:
Step [100/500] Train Loss: 0.3101
Step [200/500] Train Loss: 0.4478
Step [300/500] Train Loss: 0.3073
Step [400/500] Train Loss: 0.3947
Step [500/500] Train Loss: 0.3530
Accuracy on Test Set: 84.1200 %

image

作業(yè)

下圖表示將SE模塊嵌入到ResNet的殘差模塊.

image

其中,global pooling表示全局池化層(將輸入的size池化為1*1), 將c*h*w的輸入變?yōu)閏*1*1的輸出.FC表示全連接層(線性層),兩層FC之間使用ReLU作為激活函數(shù).通過兩層FC后使用sigmoid激活函數(shù)激活.最后將得到的c個(gè)值與原輸入c*h*w按channel相乘,得到c*h*w的輸出.

補(bǔ)充下方的代碼完成SE-Resnet block的實(shí)現(xiàn).

class SELayer(nn.Module):
    def __init__(self, channel, reduction=16):
        super(SELayer, self).__init__()
        # The output of AdaptiveAvgPool2d is of size H x W, for any input size.
        self.avg_pool = nn.AdaptiveAvgPool2d((1, 1))
        self.relu = nn.ReLU(inplace=True)
        self.fc1 = nn.Linear(channel, channel//reduction)
        self.fc2 = nn.Linear(channel//reduction, channel)
        self.sigmoid = nn.Sigmoid()
        

    def forward(self, x):
        out = self.avg_pool(x)
        out = out.view(out.size(0), -1)
        out = self.fc1(out)
        out = self.relu(out)
        out = self.fc2(out)
        out = self.sigmoid(out)
        out = out.view(out.shape[0], -1, 1, 1)
        return x*out

class SEResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1, downsample=None, reduction=16):
        super(SEResidualBlock, self).__init__()
        """
        To-Do: add code here
        """
        self.conv1 = conv3x3(in_channels, out_channels, stride)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(out_channels, out_channels)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.se = SELayer(out_channels, reduction)
        self.downsample = downsample
        
    def forward(self, x):

        residual = x
        """
        To-Do: add code here
        """
        if self.downsample:
            residual = self.downsample(x)
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out = self.se(out)
        out = out + residual
        out = self.relu(out)
        return out

se_resnet = ResNet(SEResidualBlock, [2, 2, 2])
print(se_resnet)

ResNet(
  (conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace)
  (layer1): Sequential(
    (0): SEResidualBlock(
      (conv1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (se): SELayer(
        (avg_pool): AdaptiveAvgPool2d(output_size=(1, 1))
        (relu): ReLU(inplace)
        (fc1): Linear(in_features=16, out_features=1, bias=True)
        (fc2): Linear(in_features=1, out_features=16, bias=True)
        (sigmoid): Sigmoid()
      )
    )
    (1): SEResidualBlock(
      (conv1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (se): SELayer(
        (avg_pool): AdaptiveAvgPool2d(output_size=(1, 1))
        (relu): ReLU(inplace)
        (fc1): Linear(in_features=16, out_features=1, bias=True)
        (fc2): Linear(in_features=1, out_features=16, bias=True)
        (sigmoid): Sigmoid()
      )
    )
  )
  (layer2): Sequential(
    (0): SEResidualBlock(
      (conv1): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (se): SELayer(
        (avg_pool): AdaptiveAvgPool2d(output_size=(1, 1))
        (relu): ReLU(inplace)
        (fc1): Linear(in_features=32, out_features=2, bias=True)
        (fc2): Linear(in_features=2, out_features=32, bias=True)
        (sigmoid): Sigmoid()
      )
      (downsample): Sequential(
        (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): SEResidualBlock(
      (conv1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (se): SELayer(
        (avg_pool): AdaptiveAvgPool2d(output_size=(1, 1))
        (relu): ReLU(inplace)
        (fc1): Linear(in_features=32, out_features=2, bias=True)
        (fc2): Linear(in_features=2, out_features=32, bias=True)
        (sigmoid): Sigmoid()
      )
    )
  )
  (layer3): Sequential(
    (0): SEResidualBlock(
      (conv1): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (se): SELayer(
        (avg_pool): AdaptiveAvgPool2d(output_size=(1, 1))
        (relu): ReLU(inplace)
        (fc1): Linear(in_features=64, out_features=4, bias=True)
        (fc2): Linear(in_features=4, out_features=64, bias=True)
        (sigmoid): Sigmoid()
      )
      (downsample): Sequential(
        (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): SEResidualBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (se): SELayer(
        (avg_pool): AdaptiveAvgPool2d(output_size=(1, 1))
        (relu): ReLU(inplace)
        (fc1): Linear(in_features=64, out_features=4, bias=True)
        (fc2): Linear(in_features=4, out_features=64, bias=True)
        (sigmoid): Sigmoid()
      )
    )
  )
  (avg_pool): AvgPool2d(kernel_size=8, stride=8, padding=0)
  (fc): Linear(in_features=64, out_features=10, bias=True)
)

# Hyper-parameters
num_epochs = 10
lr = 0.001
# Device configuration
device = torch.device('cuda:0')
# optimizer
optimizer = torch.optim.Adam(se_resnet.parameters(), lr=lr)

fit(se_resnet, num_epochs, optimizer, device)

Epoch 1/10:
Step [100/500] Train Loss: 1.6276
Step [200/500] Train Loss: 1.4714
Step [300/500] Train Loss: 1.4851
Step [400/500] Train Loss: 1.2222
Step [500/500] Train Loss: 1.2060
Accuracy on Test Set: 48.9400 %
Epoch 2/10:
Step [100/500] Train Loss: 2.2510
Step [200/500] Train Loss: 2.0723
Step [300/500] Train Loss: 1.8598
Step [400/500] Train Loss: 2.0755
Step [500/500] Train Loss: 1.7243
Accuracy on Test Set: 33.7100 %
Epoch 3/10:
Step [100/500] Train Loss: 1.7078
Step [200/500] Train Loss: 1.5886
Step [300/500] Train Loss: 1.5629
Step [400/500] Train Loss: 1.5738
Step [500/500] Train Loss: 1.4202
Accuracy on Test Set: 48.1800 %
Epoch 4/10:
Step [100/500] Train Loss: 1.5383
Step [200/500] Train Loss: 1.4838
Step [300/500] Train Loss: 1.3516
Step [400/500] Train Loss: 1.4415
Step [500/500] Train Loss: 1.1955
Accuracy on Test Set: 54.4100 %
Epoch 5/10:
Step [100/500] Train Loss: 1.2495
Step [200/500] Train Loss: 1.2082
Step [300/500] Train Loss: 1.1445
Step [400/500] Train Loss: 1.0991
Step [500/500] Train Loss: 1.1674
Accuracy on Test Set: 56.0800 %
Epoch 6/10:
Step [100/500] Train Loss: 1.0126
Step [200/500] Train Loss: 1.1029
Step [300/500] Train Loss: 0.8674
Step [400/500] Train Loss: 0.9355
Step [500/500] Train Loss: 1.1729
Accuracy on Test Set: 61.1100 %
Epoch 7/10:
Step [100/500] Train Loss: 1.1173
Step [200/500] Train Loss: 1.2414
Step [300/500] Train Loss: 1.1263
Step [400/500] Train Loss: 1.0653
Step [500/500] Train Loss: 0.9470
Accuracy on Test Set: 61.7000 %
Epoch 8/10:
Step [100/500] Train Loss: 1.0067
Step [200/500] Train Loss: 0.9689
Step [300/500] Train Loss: 0.9487
Step [400/500] Train Loss: 1.1266
Step [500/500] Train Loss: 1.1523
Accuracy on Test Set: 66.2600 %
Epoch 9/10:
Step [100/500] Train Loss: 0.7574
Step [200/500] Train Loss: 0.7837
Step [300/500] Train Loss: 0.9518
Step [400/500] Train Loss: 0.9028
Step [500/500] Train Loss: 0.8175
Accuracy on Test Set: 66.4400 %
Epoch 10/10:
Step [100/500] Train Loss: 0.7346
Step [200/500] Train Loss: 0.7445
Step [300/500] Train Loss: 0.8594
Step [400/500] Train Loss: 0.9784
Step [500/500] Train Loss: 0.8334
Accuracy on Test Set: 67.4600 %

image

Vgg

接下來讓我們閱讀vgg網(wǎng)絡(luò)的實(shí)現(xiàn)代碼.VGGNet全部使用3*3的卷積核和2*2的池化核毒租，通過不斷加深網(wǎng)絡(luò)結(jié)構(gòu)來提升性能痒谴。Vgg表明了卷積神經(jīng)網(wǎng)絡(luò)的深度增加和小卷積核的使用對(duì)網(wǎng)絡(luò)的最終分類識(shí)別效果有很大的作用.

image

下面是一份用于訓(xùn)練cifar10的簡(jiǎn)化版的vgg代碼.
有時(shí)間的同學(xué)可以閱讀并訓(xùn)練它.

import math

class VGG(nn.Module):
    def __init__(self, cfg):
        super(VGG, self).__init__()
        self.features = self._make_layers(cfg)
        # linear layer
        self.classifier = nn.Linear(512, 10)

    def forward(self, x):
        out = self.features(x)
        out = out.view(out.size(0), -1)
        out = self.classifier(out)
        return out

    def _make_layers(self, cfg):
        """
        cfg: a list define layers this layer contains
            'M': MaxPool, number: Conv2d(out_channels=number) -> BN -> ReLU
        """
        layers = []
        in_channels = 3
        for x in cfg:
            if x == 'M':
                layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
            else:
                layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1),
                           nn.BatchNorm2d(x),
                           nn.ReLU(inplace=True)]
                in_channels = x
        layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
        return nn.Sequential(*layers)

cfg = {
    'VGG11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'VGG19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}
vggnet = VGG(cfg['VGG11'])
print(vggnet)

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace)
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (4): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (6): ReLU(inplace)
    (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (8): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (10): ReLU(inplace)
    (11): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (12): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (13): ReLU(inplace)
    (14): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (15): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (16): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (17): ReLU(inplace)
    (18): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (19): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (20): ReLU(inplace)
    (21): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (22): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (23): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (24): ReLU(inplace)
    (25): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (26): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (27): ReLU(inplace)
    (28): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (29): AvgPool2d(kernel_size=1, stride=1, padding=0)
  )
  (classifier): Linear(in_features=512, out_features=10, bias=True)
)

# Hyper-parameters
num_epochs = 10
lr = 1e-3
# Device configuration
device = torch.device('cuda:0')

# optimizer
optimizer = torch.optim.Adam(vggnet.parameters(), lr=lr)

fit(vggnet, num_epochs, optimizer, device)

Epoch 1/10:
Step [100/500] Train Loss: 1.6253
Step [200/500] Train Loss: 1.4231
Step [300/500] Train Loss: 1.3688
Step [400/500] Train Loss: 1.3814
Step [500/500] Train Loss: 0.9911
Accuracy on Test Set: 57.4000 %
Epoch 2/10:
Step [100/500] Train Loss: 1.8048
Step [200/500] Train Loss: 1.4972
Step [300/500] Train Loss: 1.3364
Step [400/500] Train Loss: 1.2925
Step [500/500] Train Loss: 1.1823
Accuracy on Test Set: 58.4400 %
Epoch 3/10:
Step [100/500] Train Loss: 1.1463
Step [200/500] Train Loss: 0.9488
Step [300/500] Train Loss: 1.1180
Step [400/500] Train Loss: 0.9506
Step [500/500] Train Loss: 0.8822
Accuracy on Test Set: 69.1200 %
Epoch 4/10:
Step [100/500] Train Loss: 0.9562
Step [200/500] Train Loss: 0.7132
Step [300/500] Train Loss: 0.7834
Step [400/500] Train Loss: 0.9923
Step [500/500] Train Loss: 0.6245
Accuracy on Test Set: 74.0900 %
Epoch 5/10:
Step [100/500] Train Loss: 0.6804
Step [200/500] Train Loss: 0.7942
Step [300/500] Train Loss: 0.6620
Step [400/500] Train Loss: 0.5886
Step [500/500] Train Loss: 0.6147
Accuracy on Test Set: 78.1000 %
Epoch 6/10:
Step [100/500] Train Loss: 0.4513
Step [200/500] Train Loss: 0.6562
Step [300/500] Train Loss: 0.5617
Step [400/500] Train Loss: 0.6486
Step [500/500] Train Loss: 0.6400
Accuracy on Test Set: 78.4500 %
Epoch 7/10:
Step [100/500] Train Loss: 0.6970
Step [200/500] Train Loss: 0.5626
Step [300/500] Train Loss: 0.4481
Step [400/500] Train Loss: 0.5924
Step [500/500] Train Loss: 0.5008
Accuracy on Test Set: 80.9900 %
Epoch 8/10:
Step [100/500] Train Loss: 0.5288
Step [200/500] Train Loss: 0.4491
Step [300/500] Train Loss: 0.5524
Step [400/500] Train Loss: 0.5024
Step [500/500] Train Loss: 0.4200
Accuracy on Test Set: 81.3000 %
Epoch 9/10:
Step [100/500] Train Loss: 0.5242
Step [200/500] Train Loss: 0.4221
Step [300/500] Train Loss: 0.4665
Step [400/500] Train Loss: 0.6280
Step [500/500] Train Loss: 0.5573
Accuracy on Test Set: 81.2000 %
Epoch 10/10:
Step [100/500] Train Loss: 0.3493
Step [200/500] Train Loss: 0.5310
Step [300/500] Train Loss: 0.6748
Step [400/500] Train Loss: 0.4147
Step [500/500] Train Loss: 0.4272
Accuracy on Test Set: 83.5300 %

image

最后編輯于：2019.08.02 09:05:38

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末湾笛，一起剝皮案震驚了整個(gè)濱河市，隨后出現(xiàn)的幾起案子闰歪，更是在濱河造成了極大的恐慌，老刑警劉巖蓖墅，帶你破解...
沈念sama閱讀 212,542評(píng)論 6贊 493
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件库倘，死亡現(xiàn)場(chǎng)離奇詭異，居然都是意外死亡论矾，警方通過查閱死者的電腦和手機(jī)教翩，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 90,596評(píng)論 3贊 385
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來贪壳，“玉大人饱亿，你說我怎么就攤上這事。” “怎么了彪笼？”我有些...
開封第一講書人閱讀 158,021評(píng)論 0贊 348
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵钻注，是天一觀的道長(zhǎng)。經(jīng)常有香客問我配猫，道長(zhǎng)幅恋，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 56,682評(píng)論 1贊 284
?港島之戀（遺憾婚禮）
正文為了忘掉前任泵肄，我火速辦了婚禮捆交，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘腐巢。我一直安慰自己品追，他們只是感情好，可當(dāng)我...
茶點(diǎn)故事閱讀 65,792評(píng)論 6贊 386
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布冯丙。她就那樣靜靜地躺著肉瓦，像睡著了一般。火紅的嫁衣襯著肌膚如雪银还。梳的紋絲不亂的頭發(fā)上风宁，一...
開封第一講書人閱讀 49,985評(píng)論 1贊 291
城市分裂傳說
那天，我揣著相機(jī)與錄音蛹疯，去河邊找鬼戒财。笑死，一個(gè)胖子當(dāng)著我的面吹牛捺弦，可吹牛的內(nèi)容都是我干的饮寞。我是一名探鬼主播，決...
沈念sama閱讀 39,107評(píng)論 3贊 410
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼列吼，長(zhǎng)吁一口氣：“原來是場(chǎng)噩夢(mèng)啊……” “哼幽崩！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起寞钥，我...
開封第一講書人閱讀 37,845評(píng)論 0贊 268
萬榮殺人案實(shí)錄
序言：老撾萬榮一對(duì)情侶失蹤慌申，失蹤者是張志新（化名）和其女友劉穎，沒想到半個(gè)月后理郑，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體蹄溉，經(jīng)...
沈念sama閱讀 44,299評(píng)論 1贊 303
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 36,612評(píng)論 2贊 327
?白月光啟示錄
正文我和宋清朗相戀三年您炉，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了柒爵。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點(diǎn)故事閱讀 38,747評(píng)論 1贊 341
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡赚爵，死狀恐怖棉胀，靈堂內(nèi)的尸體忽然破棺而出法瑟，到底是詐尸還是另有隱情，我是刑警寧澤唁奢，帶...
沈念sama閱讀 34,441評(píng)論 4贊 333
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布霎挟，位于F島的核電站，受9級(jí)特大地震影響驮瞧，放射性物質(zhì)發(fā)生泄漏氓扛。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 40,072評(píng)論 3贊 317
男人毒藥：我在死后第九天來索命
文/蒙蒙一论笔、第九天我趴在偏房一處隱蔽的房頂上張望采郎。院中可真熱鬧，春花似錦狂魔、人聲如沸蒜埋。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,828評(píng)論 0贊 21
一樁弒父案最楷，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)整份。三九已至，卻和暖如春籽孙，著一層夾襖步出監(jiān)牢的瞬間烈评，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 32,069評(píng)論 1贊 267
情欲美人皮
我被黑心中介騙來泰國(guó)打工犯建，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留讲冠，地道東北人。一個(gè)月前我還...
沈念sama閱讀 46,545評(píng)論 2贊 362
代替公主和親
正文我出身青樓，卻偏偏與公主長(zhǎng)得像，于是被迫代替她去往敵國(guó)和親捎琐。傳聞我的和親對(duì)象是個(gè)殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 43,658評(píng)論 2贊 350