深度可分离卷积

深度可分离卷积是一种高效卷积方法，将标准卷积分解为深度卷积和逐点卷积两步，在保持特征提取能力的同时大幅降低计算量。相比标准卷积，其计算量可减少约88%，参数量也显著下降，特别适合移动端和嵌入式设备部署（如MobileNet）。虽然会略微损失精度且跨通道交互能力较弱，但通过1×1卷积实现通道融合，仍能保持较好的特征表达能力。适用于轻量化模型（如目标检测、人脸识别），但不适合通道间关系复杂的任务。Py

张克飞412

1191人浏览 · 2025-10-24 21:37:31

张克飞412 · 2025-10-24 21:37:31 发布

一、前言：为什么会出现深度可分离卷积？

在卷积神经网络（CNN）早期阶段（如 VGG、ResNet），我们通常使用标准卷积（Standard Convolution）。
虽然这种操作能有效提取空间特征和通道特征，但计算量极大，尤其在移动端或嵌入式设备上难以部署。

核心矛盾：

想提取更多特征 → 卷积核越多 → 参数量激增

想减少计算量 → 卷积层变浅 → 表达能力下降

为解决这种“性能与效率”的矛盾，Google 在 MobileNet（2017） 中提出了 深度可分离卷积（Depthwise Separable Convolution），它将传统卷积分解为两步：
“空间卷积” + “通道融合”，在保持精度的同时极大地降低了计算量。

二、标准卷积回顾

设输入特征图尺寸为：
$[ X \in \mathbb{R}^{H \times W \times C_{in}} ]$
输出特征图为：
$[ Y \in \mathbb{R}^{H' \times W' \times C_{out}} ]$
卷积核大小为 ( $K \times K$ )。

标准卷积的输出第 ( j ) 个通道计算为：
$[ Y_j = \sum_{i=1}^{C_{in}} X_i * W_{ij} ]$
其中 ( $W_{ij} \in \mathbb{R}^{K \times K}$ )。

🧮 参数量与计算量：

参数量：
$[ P_{standard} = K^2 \times C_{in} \times C_{out} ]$
计算量（FLOPs）：
$[ F_{standard} = H' \times W' \times K^2 \times C_{in} \times C_{out} ]$

显然，参数和计算量都与输入通道和输出通道呈乘积关系，当 ( $C_{in}$ )、( $C_{out}$ ) 较大时，计算代价极其高。

三、深度卷积（Depthwise Convolution）

3.1 数学定义

深度卷积的思想是：
“每个输入通道单独做卷积”，不再跨通道混合。

即对每个输入通道 ( $X_i$ ) 仅使用一个卷积核 ( $W_i$ )：
$[ Y_i = X_i * W_i ]$
其中 ( $W_i \in \mathbb{R}^{K \times K}$ )。

这样，输出通道数 = 输入通道数。

3.2 参数量分析

$[ P_{DW} = K^2 \times C_{in} ]$

相比标准卷积：参数减少
$[ = \frac{P_{DW}}{P_{standard}} = \frac{1}{C_{out}} ]$

3.3 计算量

$[ F_{DW} = H' \times W' \times K^2 \times C_{in} ]$
计算量随通道线性增长，而非二次增长。

四、逐点卷积（Pointwise Convolution）

4.1 数学定义

逐点卷积就是 1×1 卷积，它的作用是：

将前一步的深度卷积结果进行通道融合与线性变换。

$[ Y_j = \sum_{i=1}^{C_{in}} X_i * W_{ij}^{1\times1} ]$
此时卷积核尺寸为 ( $1 \times 1$ )。

4.2 参数量与计算量

参数量：
$[ P_{PW} = C_{in} \times C_{out} ]$
计算量：
$[ F_{PW} = H' \times W' \times C_{in} \times C_{out} ]$

五、深度可分离卷积整体结构（DW + PW）

深度可分离卷积将标准卷积分为两步：

Depthwise Conv：独立提取每个通道的空间特征；
Pointwise Conv：1×1卷积混合通道特征，生成新通道组合。

🧮 参数与计算量总和：

$[ \begin{aligned} P_{DSC} &= K^2 \times C_{in} + C_{in} \times C_{out} \ F_{DSC} &= H' \times W' \times (K^2 \times C_{in} + C_{in} \times C_{out}) \end{aligned} ]$

与标准卷积相比：
[
\text{计算量比} = \frac{F_{DSC}}{F_{standard}} = \frac{1}{C_{out}} + \frac{1}{K^2}
]

例如当 ( K=3, $C_{out}=128$ ) 时：
$[ \frac{F_{DSC}}{F_{standard}} = \frac{1}{128} + \frac{1}{9} \approx 0.12 ]$
即 仅需约 12% 的计算量！

六、MobileNet 中的应用

6.1 结构替换思想

MobileNet V1（2017）首次全面采用深度可分离卷积：

Standard Conv → Depthwise Conv + Pointwise Conv

每一层替换后，网络结构示意：

Input → DWConv → BN → ReLU → PWConv → BN → ReLU → Output

6.2 实际效果

模型	Top-1 精度	计算量（FLOPs）	参数量	相比ResNet加速
VGG-16	71.5%	15.3B	138M	×1
MobileNet V1	70.6%	0.57B	4.2M	≈27倍更快

深度可分离卷积虽然略损精度，但换来了数量级的效率提升，尤其适合移动端、嵌入式平台。

七、PyTorch 实现（两种方式）

7.1 使用 `groups` 参数实现 Depthwise

import torch
import torch.nn as nn

# 深度卷积 + 逐点卷积 实现
class DepthwiseSeparableConv(nn.Module):
    def __init__(self, in_ch, out_ch, kernel_size=3, stride=1, padding=1):
        super().__init__()
        # 深度卷积: 每个输入通道单独卷积
        self.depthwise = nn.Conv2d(in_ch, in_ch, kernel_size, stride,
                                   padding, groups=in_ch, bias=False)
        # 逐点卷积: 1x1卷积融合通道
        self.pointwise = nn.Conv2d(in_ch, out_ch, 1, bias=False)
        self.bn = nn.BatchNorm2d(out_ch)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.depthwise(x)
        x = self.pointwise(x)
        x = self.bn(x)
        return self.relu(x)

# 示例
x = torch.randn(1, 32, 112, 112)
model = DepthwiseSeparableConv(32, 64)
print(model(x).shape)  # torch.Size([1, 64, 112, 112])

7.2 自定义实现（手写两步）

class DW_PW_Custom(nn.Module):
    def __init__(self, in_ch, out_ch):
        super().__init__()
        self.dw = nn.Conv2d(in_ch, in_ch, 3, padding=1, groups=in_ch)
        self.pw = nn.Conv2d(in_ch, out_ch, 1)
    def forward(self, x):
        return self.pw(self.dw(x))

八、优缺点总结

优点	缺点
参数量、计算量显著下降（约 1/9）	减少跨通道特征交互，特征表达能力下降
支持高并行性，适合移动端部署	不适合通道间关系复杂的任务（如检测）
可与BN/激活轻松组合	单独使用时精度下降明显