TOC
1. Popular Models
1.1 LeNet-5
Features:
- Local connections and weight sharing
- C layers: convolution
- Output where is the patch size, is the sigmoid function, are parameters
- S layers: subsampling (avg pooling)
- Output where is the pooling size
- Output layer: RBF units
- Loss Function: maximum likelihood with modifications
- Train with back-prop
1.2 AlexNet
Features:
- Classification: 1000 classes, 1.2 million training images.
- In total: 60 million parameters
1.3 VGG net
Features:
- 3*3 filters are extensively used
- GPU implement
1.4 GoogLeNet
Features:
- 22 weight layers, Multiple sizes in the same layer
- Small filters (1*1, 3*3, 5*5), 1*1 conv are used to reduce the number of channels
- Two auxiliary classifiers for BP algorithm
- A cpu-based implementation on distributed system
1.5 ResNet
Intuitively, more layers should have better results. Consider two models
For the extra two layers in model B:
- If they are identity mappings, then A and B are equivalent
- If they include identity mapping as special cases, the capacity of B is larger than A
Thus error or B should not be larger than that of A.
However, the empirical results are not the case
The reason is that it might be difficult for nonlinear layers to approximate the identity mapping. If this is the case, we can explicitly use the identity mapping
The nonlinear mapping from input to output has two parts
Then the (two) weight layers are leanring , that is the residual
1.6 DenseNet
Features:
- Each layer takes all preceding feature-maps as input, which are concatenated together.
- An L-layer net has connections
- Each layer outputs feature maps and is small
2. Light Weight Models
2.1 Depthwise Separable Convolution
For standard convolution, number of parameters is . The computation cost is
ย
For depth-wise convolution, number of parameters is and the computation cost is . For point-wise convolution, number of parameter is and the computation cost is .
ย
As for number of parameters:
As for computation cost:
A typical setting is , then the above ratio is about .
Implementation
2.2 Group Convolution
Deal with the limited memory of GPUs
Reduce the computational cost and enhance the performance (ResNeXt)
ShuffleNet
Loading Comments...