如果不想听我叨叨的话可以直接前往代码部分进行copy，并参照注释里的demo进行使用

前言#

在上学期的机器视觉大作业中我用到了ResNet50-Unet，寒假中做分类任务时又用到了ResNet，但是之前我用ResNet要么是pip之后直接import，要么是参照hw里面助教给的初始代码进行增删。本着搞懂ResNet这么一个经典模型的心态，我决定自己手搓一遍ResNet（好吧其实还是有参照，但是在参照的基础上加了一点东西）。

ResNet基本思想#

ResNet通过引入直接连接的旁路（shortcut），减少了反向传播时梯度消失的问题，使得模型能搭的更深，更不容易过拟合。

下表是各种CNN架构在ImageNet数据集上的top-5 error rate，可以看到ResNet相比VGG等其它架构，有着更好的效果。

代码#

"""
Implementation of ResNet with pytorch
Simple usage:
    from resnet_pytorch import *
    classifier = resnet()

All usage:
    demo:
	Customization resnet:
	    classifier = resnet(resblock_basic, 3, [64, 128, 256, 512], [1, 1, 1, 1], 11)
        Only specify the type of resnet:
            classifier = resnet(class_num=12, net_type="resnet101")
        Input no parameters:
            classifier = resnet(class_num=16) # return resnet50

References:
[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
    Deep Residual Learning for Image Recognition
    https://arxiv.org/abs/1512.03385v1

[2] https://github.com/weiaicunzai/pytorch-cifar100/blob/master/models/resnet.py

[3] 《深度学习计算机视觉》 Mohamed Elgendy, page 191-197
"""

import torch
import torch.nn as nn


class resblock_basic(nn.Module):
    """
        the block for resnet18 and resnet34
    """

    # in resblock for resnet18 and resnet34, the expansion of filters in the last layer of the block is 1
    # in bottleneck, the expansion of filters in the last layer of the block is 4
    expansion = 1

    def __init__(self, in_channels, out_channels, stride=1):
        super().__init__()

        self.res_function = nn.Sequential(
            # if the kernel size equals to 3, the padding should be 1
            nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(out_channels * self.expansion)
        )

        # If the output size and number of channels are equal to the input,
        # the shortcut path do nothing
        self.shortcut = nn.Sequential()

        # If the output size or number of channels is unequal to the input,
        # the shortcut path should be a sequence of 1x1 convolution layer to downsample and a batchnormalization layer
        if stride != 1 or in_channels != out_channels * self.expansion:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels * self.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels * self.expansion)
            )

    def forward(self, x):
        return nn.ReLU(inplace=True)(self.res_function(x) + self.shortcut(x))


class resblock_bottleneck(nn.Module):
    """
        the block for resnet50, resnet101 and resnet152
    """

    # in resblock for resnet18 and resnet34, the expansion of filters in the last layer of the block is 1
    # in bottleneck, the expansion of filters in the last layer of the block is 4
    expansion = 4

    def __init__(self, in_channels, out_channels, stride=1):
        super().__init__()

        self.res_function = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True),
            # maxpooling is replaced by convolution layer with stride unequal to 1
            nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=1, stride=1),
            nn.BatchNorm2d(out_channels * self.expansion)
        )

        # If the output size or number of channels is unequal to the input,
        # the shortcut path should be a sequence of 1x1 convolution layer to downsample and a batchnormalization layer
        self.shortcut = nn.Sequential()

        if stride != 1 or in_channels != out_channels * self.expansion:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels * self.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels * self.expansion)
            )

    def forward(self, x):
        return nn.ReLU(inplace=True)(self.res_function(x) + self.shortcut(x))


class resnet(nn.Module):
    def __init__(self, block=resblock_bottleneck, channel=3, filter_list=None, block_num_list=None,
                 class_num=10, net_type=None):
        """
            block: type of block, default: bottleneck
            channel: the channel of image, 1 for gray image and 3 for RGB image. default: 3
            filter_list: the filter numbers of each blocks' first layer. default: None
            block_num_list: repeat times for each block, the length should be equal to filter_list. default: None
            class_num: the number of classes for classification. default: 10
            net_type: the type of resnet, 'resnet50' for example. default: None

            demo:
                Customization resnet:
                    classifier = resnet(resblock_basic, 3, [64, 128, 256, 512], [1, 1, 1, 1], 11)
                Only specify the type of resnet:
                    classifier = resnet(class_num=12, net_type="resnet101")
                Input no parameters:
                    classifier = resnet(class_num=16) # return resnet50
        """
        super().__init__()

        if block_num_list is None:
            block_num_list = [3, 4, 6, 3]
        if filter_list is None:
            filter_list = [64, 128, 256, 512]

        # different types of resnet in the original paper
        if net_type == 'resnet18':
            block = resblock_basic
            filter_list = [64, 128, 256, 512]
            block_num_list = [2, 2, 2, 2]
        elif net_type == 'resnet34':
            block = resblock_basic
            filter_list = [64, 128, 256, 512]
            block_num_list = [3, 4, 6, 3]
        elif net_type == 'resnet50':
            block = resblock_bottleneck
            filter_list = [64, 128, 256, 512]
            block_num_list = [3, 4, 6, 3]
        elif net_type == 'resnet101':
            block = resblock_bottleneck
            filter_list = [64, 128, 256, 512]
            block_num_list = [3, 4, 23, 3]
        elif net_type == 'resnet152':
            block = resblock_bottleneck
            filter_list = [64, 128, 256, 512]
            block_num_list = [3, 8, 36, 3]

        self.resblock_in_channel = 64

        self.pre_conv_layer = nn.Sequential(
            nn.Conv2d(in_channels=channel, out_channels=self.resblock_in_channel, kernel_size=7, stride=2, padding=3),
            nn.BatchNorm2d(self.resblock_in_channel),
            nn.ReLU(inplace=True)
        )

        stride_list = [1] + [2] * (len(filter_list) - 1)
        self.resblocks = nn.ModuleList()
        for i in range(len(filter_list)):
            self.resblocks.append(self._make_block(block, filter_list[i], block_num_list[i], stride_list[i]))

        self.avg_pool = nn.AdaptiveAvgPool2d((1, 1))

        self.fc = nn.Sequential(
            nn.Dropout(0.25),
            nn.Linear(filter_list[-1], class_num)
        )

    def _make_block(self, block, filter, block_num, stride):
        layers = []
        strides = [stride] + [1] * (block_num - 1)
        for stride in strides:
            layers.append(block(self.resblock_in_channel, filter, stride))
            self.resblock_in_channel = filter * block.expansion

        return nn.Sequential(*layers)

    def forward(self, x):
        output = self.pre_conv_layer(x)
        for block in self.resblocks:
            output = block(output)
        output = self.avg_pool(output)
        output = output.view(output.size(0), -1)
        output = self.fc(output)

        return output

各部分详解#

Residual block#

在ResNet的原始论文中，提出了如下图两种residual block，右边的一种被称为bottleneck。前一种residual block在ResNet层数较浅时使用，如ResNet18，ResNet34；后一种residual block在ResNet层数较深时使用，如ResNet50、ResNet101、ResNet152。

resblock_basic类#

前向传播过程：residual block前向传播的过程，要经过两次卷积+batch normalization，其中第一次卷积、batch normaliztion后，需要经过ReLU，而第二次卷积+batch normalization后，会和通过shortcut path的输入一起进行ReLU。

输出channel数：block中，两个卷积层输出的channel数是相同的。

降采样方法：residual block里面不设置max pooling，而是通过卷积层中设置大于1的步长起到降采样的作用，一个block中只有第一层卷积层中的stride可能大于1，第二个卷积层的stride为1。

padding：由于卷积核大小为3x3，所以两个卷积层的padding都应该为1。

shortcut path：shortcut path可能会遇到两种情况，如果residual block的输入面积和channel数和第二个卷积层的输出相同，那么shortcut path不需要做任何操作；如果输出面积和输入面积不同，或者输出和输入channel数不同，那么需要在shortcut path中加入1x1卷积层和batch normalization进行降采样。

resblock_bottlenect类#

前向传播过程：residual block前向传播的过程，要经过三次卷积+batch normalization，其中前两次卷积、batch normaliztion后，需要经过ReLU，而第三次卷积+batch normalization后，会和通过shortcut path的输入一起进行ReLU。

输出channel数：block中，前两个卷积层输出的channel数是相同的，而第三个卷积层输出的channel数是前两层的四倍。

降采样方法：一个block中只有第二层3x3卷积层中的stride可能大于1，第一、三个1x1卷积层的stride为1。

padding：由于第二层卷积核大小为3x3，所以第二个卷积层的padding应该为1。

resnet类#

在原始论文中，ResNet要先经过一个7x7卷积层，然后在经过若干个residual block，最后通过FC得到输出。

预卷积层：原始论文中，预卷积层卷积核大小为7x7，所以padding=3，该卷积层步长为2，起到降采样作用，输出channel数设置为64。

residual block序列：中间的residual block序列可以用 nn.ModuleList存放，通过 _make_block函数循环添加。

自适应平均池化层：将特征图自适应转化为序列。

全连接层：设置0.25 dropout率，然后再全连接。

References#

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Deep Residual Learning for Image Recognition https://arxiv.org/abs/1512.03385v1
https://github.com/weiaicunzai/pytorch-cifar100/blob/master/models/resnet.py
《深度学习计算机视觉》 Mohamed Elgendy, page 191-197

本文采用CC-BY-SA-3.0协议，转载请注明出处
作者: 核子

核子的Blog

记手搓ResNet的经历

前言#

ResNet基本思想#

代码#

各部分详解#

Residual block#

resblock_basic类#

resblock_bottlenect类#

resnet类#

References#