如果不想听我叨叨的话可以直接前往代码部分进行copy,并参照注释里的demo进行使用

前言#

在上学期的机器视觉大作业中我用到了ResNet50-Unet,寒假中做分类任务时又用到了ResNet,但是之前我用ResNet要么是pip之后直接import,要么是参照hw里面助教给的初始代码进行增删。本着搞懂ResNet这么一个经典模型的心态,我决定自己手搓一遍ResNet(好吧其实还是有参照,但是在参照的基础上加了一点东西)。

ResNet基本思想#

ResNet通过引入直接连接的旁路(shortcut),减少了反向传播时梯度消失的问题,使得模型能搭的更深,更不容易过拟合。

image-20230115224650967

下表是各种CNN架构在ImageNet数据集上的top-5 error rate,可以看到ResNet相比VGG等其它架构,有着更好的效果。

image-20230115225116240

代码#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
"""
Implementation of ResNet with pytorch
Simple usage:
from resnet_pytorch import *
classifier = resnet()

All usage:
demo:
Customization resnet:
classifier = resnet(resblock_basic, 3, [64, 128, 256, 512], [1, 1, 1, 1], 11)
Only specify the type of resnet:
classifier = resnet(class_num=12, net_type="resnet101")
Input no parameters:
classifier = resnet(class_num=16) # return resnet50

References:
[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
Deep Residual Learning for Image Recognition
https://arxiv.org/abs/1512.03385v1

[2] https://github.com/weiaicunzai/pytorch-cifar100/blob/master/models/resnet.py

[3] 《深度学习计算机视觉》 Mohamed Elgendy, page 191-197
"""

import torch
import torch.nn as nn


class resblock_basic(nn.Module):
"""
the block for resnet18 and resnet34
"""

# in resblock for resnet18 and resnet34, the expansion of filters in the last layer of the block is 1
# in bottleneck, the expansion of filters in the last layer of the block is 4
expansion = 1

def __init__(self, in_channels, out_channels, stride=1):
super().__init__()

self.res_function = nn.Sequential(
# if the kernel size equals to 3, the padding should be 1
nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True),
nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(out_channels * self.expansion)
)

# If the output size and number of channels are equal to the input,
# the shortcut path do nothing
self.shortcut = nn.Sequential()

# If the output size or number of channels is unequal to the input,
# the shortcut path should be a sequence of 1x1 convolution layer to downsample and a batchnormalization layer
if stride != 1 or in_channels != out_channels * self.expansion:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels, out_channels * self.expansion, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(out_channels * self.expansion)
)

def forward(self, x):
return nn.ReLU(inplace=True)(self.res_function(x) + self.shortcut(x))


class resblock_bottleneck(nn.Module):
"""
the block for resnet50, resnet101 and resnet152
"""

# in resblock for resnet18 and resnet34, the expansion of filters in the last layer of the block is 1
# in bottleneck, the expansion of filters in the last layer of the block is 4
expansion = 4

def __init__(self, in_channels, out_channels, stride=1):
super().__init__()

self.res_function = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True),
# maxpooling is replaced by convolution layer with stride unequal to 1
nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True),
nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=1, stride=1),
nn.BatchNorm2d(out_channels * self.expansion)
)

# If the output size or number of channels is unequal to the input,
# the shortcut path should be a sequence of 1x1 convolution layer to downsample and a batchnormalization layer
self.shortcut = nn.Sequential()

if stride != 1 or in_channels != out_channels * self.expansion:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels, out_channels * self.expansion, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(out_channels * self.expansion)
)

def forward(self, x):
return nn.ReLU(inplace=True)(self.res_function(x) + self.shortcut(x))


class resnet(nn.Module):
def __init__(self, block=resblock_bottleneck, channel=3, filter_list=None, block_num_list=None,
class_num=10, net_type=None):
"""
block: type of block, default: bottleneck
channel: the channel of image, 1 for gray image and 3 for RGB image. default: 3
filter_list: the filter numbers of each blocks' first layer. default: None
block_num_list: repeat times for each block, the length should be equal to filter_list. default: None
class_num: the number of classes for classification. default: 10
net_type: the type of resnet, 'resnet50' for example. default: None

demo:
Customization resnet:
classifier = resnet(resblock_basic, 3, [64, 128, 256, 512], [1, 1, 1, 1], 11)
Only specify the type of resnet:
classifier = resnet(class_num=12, net_type="resnet101")
Input no parameters:
classifier = resnet(class_num=16) # return resnet50
"""
super().__init__()

if block_num_list is None:
block_num_list = [3, 4, 6, 3]
if filter_list is None:
filter_list = [64, 128, 256, 512]

# different types of resnet in the original paper
if net_type == 'resnet18':
block = resblock_basic
filter_list = [64, 128, 256, 512]
block_num_list = [2, 2, 2, 2]
elif net_type == 'resnet34':
block = resblock_basic
filter_list = [64, 128, 256, 512]
block_num_list = [3, 4, 6, 3]
elif net_type == 'resnet50':
block = resblock_bottleneck
filter_list = [64, 128, 256, 512]
block_num_list = [3, 4, 6, 3]
elif net_type == 'resnet101':
block = resblock_bottleneck
filter_list = [64, 128, 256, 512]
block_num_list = [3, 4, 23, 3]
elif net_type == 'resnet152':
block = resblock_bottleneck
filter_list = [64, 128, 256, 512]
block_num_list = [3, 8, 36, 3]

self.resblock_in_channel = 64

self.pre_conv_layer = nn.Sequential(
nn.Conv2d(in_channels=channel, out_channels=self.resblock_in_channel, kernel_size=7, stride=2, padding=3),
nn.BatchNorm2d(self.resblock_in_channel),
nn.ReLU(inplace=True)
)

stride_list = [1] + [2] * (len(filter_list) - 1)
self.resblocks = nn.ModuleList()
for i in range(len(filter_list)):
self.resblocks.append(self._make_block(block, filter_list[i], block_num_list[i], stride_list[i]))

self.avg_pool = nn.AdaptiveAvgPool2d((1, 1))

self.fc = nn.Sequential(
nn.Dropout(0.25),
nn.Linear(filter_list[-1], class_num)
)

def _make_block(self, block, filter, block_num, stride):
layers = []
strides = [stride] + [1] * (block_num - 1)
for stride in strides:
layers.append(block(self.resblock_in_channel, filter, stride))
self.resblock_in_channel = filter * block.expansion

return nn.Sequential(*layers)

def forward(self, x):
output = self.pre_conv_layer(x)
for block in self.resblocks:
output = block(output)
output = self.avg_pool(output)
output = output.view(output.size(0), -1)
output = self.fc(output)

return output

各部分详解#

Residual block#

在ResNet的原始论文中,提出了如下图两种residual block,右边的一种被称为bottleneck。前一种residual block在ResNet层数较浅时使用,如ResNet18,ResNet34;后一种residual block在ResNet层数较深时使用,如ResNet50、ResNet101、ResNet152。

image-20230115230500605

resblock_basic类#

前向传播过程:residual block前向传播的过程,要经过两次卷积+batch normalization,其中第一次卷积、batch normaliztion后,需要经过ReLU,而第二次卷积+batch normalization后,会和通过shortcut path的输入一起进行ReLU。

输出channel数:block中,两个卷积层输出的channel数是相同的。

降采样方法:residual block里面不设置max pooling,而是通过卷积层中设置大于1的步长起到降采样的作用,一个block中只有第一层卷积层中的stride可能大于1,第二个卷积层的stride为1。

padding:由于卷积核大小为3x3,所以两个卷积层的padding都应该为1。

shortcut path:shortcut path可能会遇到两种情况,如果residual block的输入面积和channel数和第二个卷积层的输出相同,那么shortcut path不需要做任何操作;如果输出面积和输入面积不同,或者输出和输入channel数不同,那么需要在shortcut path中加入1x1卷积层和batch normalization进行降采样。

resblock_bottlenect类#

前向传播过程:residual block前向传播的过程,要经过三次卷积+batch normalization,其中前两次卷积、batch normaliztion后,需要经过ReLU,而第三次卷积+batch normalization后,会和通过shortcut path的输入一起进行ReLU。

输出channel数:block中,前两个卷积层输出的channel数是相同的,而第三个卷积层输出的channel数是前两层的四倍。

降采样方法:一个block中只有第二层3x3卷积层中的stride可能大于1,第一、三个1x1卷积层的stride为1。

padding:由于第二层卷积核大小为3x3,所以第二个卷积层的padding应该为1。

shortcut path:shortcut path可能会遇到两种情况,如果residual block的输入面积和channel数和第二个卷积层的输出相同,那么shortcut path不需要做任何操作;如果输出面积和输入面积不同,或者输出和输入channel数不同,那么需要在shortcut path中加入1x1卷积层和batch normalization进行降采样。

resnet类#

在原始论文中,ResNet要先经过一个7x7卷积层,然后在经过若干个residual block,最后通过FC得到输出。

预卷积层:原始论文中,预卷积层卷积核大小为7x7,所以padding=3,该卷积层步长为2,起到降采样作用,输出channel数设置为64。

residual block序列:中间的residual block序列可以用 nn.ModuleList存放,通过 _make_block函数循环添加。

自适应平均池化层:将特征图自适应转化为序列。

全连接层:设置0.25 dropout率,然后再全连接。

References#

  1. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Deep Residual Learning for Image Recognition https://arxiv.org/abs/1512.03385v1
  2. https://github.com/weiaicunzai/pytorch-cifar100/blob/master/models/resnet.py
  3. 《深度学习计算机视觉》 Mohamed Elgendy, page 191-197

本文采用CC-BY-SA-3.0协议,转载请注明出处
作者: 核子