网站首页 > 技术文章正文

「图像分类」Mobile net V1和V2理论+实战

btikc 2024-11-01 11:26:50 技术文章 5 ℃ 0 评论

在2017年左右，科研人员开始研究如何把模型部署到轻量级的设备上，比如移动设备或嵌入式设备

直接把原有的网络部署到轻量级设备不就行了吗？

传统的卷积神经网络，内存需求大，动不动就上百兆的参数，运算量也非常大，在移动端的设备上几乎是无法运行的。

本学习资料来自b站霹雳吧拉Wz

Mobile net· V1版本

为了使这些模型应用到我们的实际生活中，相继诞生了一系列的轻量级网络，MobileNet、ShuffleNet等

2017年Google团队提出了Mobile net，也就是我们现在说的v1版本，再论中提到，与传统的卷积神经网络VGG相比，在ImageNet数据集上，准确率减少了0.9%，但是模型参数只有VGG网络的1/32

论文亮点：

1.主要是使用了Depthwise Convolution，以此来减少运算量、参数量。

2.还使用了超参数α、β，前者用来调整输出特征channel的大小，后者调整输入尺寸的大小

传统的卷积我们已经很熟悉了，本文运用的dw卷积到底是怎么回事呢？

dw卷积又叫深度卷积，一个卷积核只有1个channel，输入图像的一个channel只被一个卷积核作用

深度和卷积又是怎么回事呢？

深度可分卷积=dw卷积+pw卷积，dw卷积我们已经知道了，pw卷积又是什么呢？

pw卷积又叫逐点卷积，也就是使用1×1×M大小的卷积核(M为输入图像的channel)，卷积核的个数为输出的feature map的个数

pw卷积和传统的卷积没有什么区别，只不过尺寸大小是1×1

那么我们现在已经知道了什么是深度可分卷积，为什么要提出这个卷积呢，因为这种卷积可以减少运算量，那它到底是怎么减少运算量的呢？

注意我们现在算的是运算量而不是参数量，不懂的小伙伴可以去看看两者的区别。

先说普通卷积，输出图像的一个pixel是由卷积核算了Dk×Dk×M，M是输入图像的channel，Df是输入图像的大小，这里输出图像和输入图像的大小是一样的，因为padding=1，kernal_size=3,stride=1

现在算出了一个输出图像上一个pixel的运算量，输出图像一共有Df×Df×N个像素，所以得到上图的运算结果

由以上的计算结果看到深度可分卷积的计算量将近少了9倍，至此第一个亮点介绍完毕。

还有一个是超参数问题，这个主要是在刚才的运算量计算上做文章，α作用在输入输出的通道上，也就是M、N，β作用在输入输出的尺寸上，也就是Dk和Df,上图右侧是设置不同大小的超参数得到的结果

左侧也就是v1的搭建，和VGG类似，只不过是使用新型的卷积进行堆叠。

Mobile netV2· 理论讲解

但是在使用Mobile netv1时，dw卷积参数大部分为0，容易废掉，Google接着提出了V2版本。

v2亮点：

主要提出了倒残差结构、linear bottleneck。

以前的残差网络及结构是使用1×1的卷积先降维，在使用3×3的普通卷积，在使用1×1的卷积核升维。而本论文刚好相反，先升维再降维

在论文中提到，使用的激活函数为ReLu6激活函数，但是由于残差结构的存在，输出特征为低维特征所以使用的是线性激活，也可以理解为没有激活函数。

因为在论文中实验表明，ReLu函数对低维的特征会造成很大的损失

Bottleneck可以看成一个block，两头是pw卷积，中间是dw卷积，值得注意的第三个卷积层是线性激活

另外就是残差结构的shortcut，什么时候存在什么时候不存在，当输入特征矩阵与输出特征矩阵大小相同的时候才能add，也就是stride=1时

至于V2模型的搭建，也就是使用多个这样的block，另外还有4个参数，t、c、n、s

关注共总好：“AI深度学习与图像处理”，分享技术、教程、理财等！期待你和我的共同成长！

代码实战· 模型搭建

from torch import nn
import torch
def _make_divisible(ch, divisor=8, min_ch=None):
    if min_ch is None:
        min_ch = divisor
    new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_ch < 0.9 * ch:
        new_ch += divisor
    return new_ch
    
  class ConvBNReLU(nn.Sequential):
    def __init__(self, in_channel, out_channel, kernel_size=3, stride=1, groups=1):
        padding = (kernel_size - 1) // 2
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_channel, out_channel, kernel_size, stride, padding, groups=groups, bias=False),
            nn.BatchNorm2d(out_channel),
            nn.ReLU6(inplace=True)
        )
class InvertedResidual(nn.Module):
    def __init__(self, in_channel, out_channel, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        hidden_channel = in_channel * expand_ratio
        self.use_shortcut = stride == 1 and in_channel == out_channel
        layers = []
        if expand_ratio != 1:
            # 1x1 pointwise conv
            layers.append(ConvBNReLU(in_channel, hidden_channel, kernel_size=1))
        layers.extend([
            # 3x3 depthwise conv
            ConvBNReLU(hidden_channel, hidden_channel, stride=stride, groups=hidden_channel),
            # 1x1 pointwise conv(linear)
            nn.Conv2d(hidden_channel, out_channel, kernel_size=1, bias=False),
            nn.BatchNorm2d(out_channel),
        ])
        self.conv = nn.Sequential(*layers)
    def forward(self, x):
        if self.use_shortcut:
            return x + self.conv(x)
        else:
            return self.conv(x)
class MobileNetV2(nn.Module):
    def __init__(self, num_classes=1000, alpha=1.0, round_nearest=8):
        super(MobileNetV2, self).__init__()
        block = InvertedResidual
        input_channel = _make_divisible(32 * alpha, round_nearest)
        last_channel = _make_divisible(1280 * alpha, round_nearest)
        inverted_residual_setting = [
            # t, c, n, s
            [1, 16, 1, 1],
            [6, 24, 2, 2],
            [6, 32, 3, 2],
            [6, 64, 4, 2],
            [6, 96, 3, 1],
            [6, 160, 3, 2],
            [6, 320, 1, 1],
        ]
        features = []
        # conv1 layer
        features.append(ConvBNReLU(3, input_channel, stride=2))
        # building inverted residual residual blockes
        for t, c, n, s in inverted_residual_setting:
            output_channel = _make_divisible(c * alpha, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                features.append(block(input_channel, output_channel, stride, expand_ratio=t))
                input_channel = output_channel
        # building last several layers
        features.append(ConvBNReLU(input_channel, last_channel, 1))
        # combine feature layers
        self.features = nn.Sequential(*features)
        # building classifier
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(last_channel, num_classes)
        )
        # weight initialization
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)
    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

上一篇：「知识星球」LeNet5，AlexNet，MobileNet它们的前身你知道吗?
下一篇：卷积神经网络CNN总结(五) 卷积神经网络conv1d

网站首页 > 技术文章正文

「图像分类」Mobile net V1和V2理论+实战

猜你喜欢

本文暂时没有评论，来添加一个吧(●'◡'●)

取消回复欢迎你发表评论:

网站首页 > 技术文章 正文

「图像分类」Mobile net V1和V2理论+实战

猜你喜欢

本文暂时没有评论，来添加一个吧(●'◡'●)

取消回复欢迎 你 发表评论:

网站首页 > 技术文章正文

取消回复欢迎你发表评论: