TOC
1. Convolutional Layer
Intro: LeNet5 (LeCun et al. 1998)
LeNet5 includes two kinds of layers:
- C layers: convolution
- Output where is the patch size, is the sigmoid function, are parameters
- S layers: subsampling (avg pooling)
- Output where is the pooling size
1.1 Similarity
1D Correlation calculation
Suppose there are two 1D sequences A and B where the length of B is smaller than that of A.
Cosine similarity between two vectors and
if the two vectors have unit length.
Naively, we could slide B on A and calculate the similarity one by one.
the processs could be slow, we can use GPU because there is no interaction between different calculations.
2D correlation calculation
Suppose there are two 2D images A and B where the size of B is smaller than that of A. The cosine similarity between two matrices and is
if the two matrices have unit Frobenius norm.
Naively, we could slide B on A and calculate the similarity one by one.
As 1D situation, this process could be slow, we can use GPU.
1.2 1D Convolution
Calculation
Continuous convolution:
Discrete convolution (for finite length sequences)
Note that elements orders are reverse in and .
There are three kinds of convolution which yield different output shapes respectively.
- Valid: length of
- Full: length of
- Same: length of
The βSameβ convolution can also be obtained by βvalidβ convolution of with zero-padded (since most DL frameworks only implement the βvalidβ convolution method).
Besides, note that when calculating convolution with stride 1, the output size is .
Relationship between 1D similarity and 1D convolution
Calculating the similarity between sequence and each part of sequence is equivalent to calculating where
The above filp operation can be realized by applying the command
numpy.rot90()
twice (rot180()
)or flip the vector along the axis 01.3 2D convolution
Calculation
Suppose that there are two matrices and with sizes and , respectively, where .
Discrete convolution of the two matrices:
For example, when
There are also three kinds of convolution which yield different output shapes respectively.
- Valid: shape of
- Full: shape of
- Same: shape of
Similarly, the βSameβ 2D convolution can be realized by padding using βValidβ 2D convolution. The shape of is
Relationship between 2D similarity and 2D convolution
Calculating the similarity between matrix and each part of matrix is equivalent to calcualting where
The above operation can be realized by applying the command
numpy.rot90()
twice (rot180()
) which is equivalent to flip the matrix along the axes 0 and 1.Effect of 2D convolution
As above figure shown, the higher a pixel value (brighter) in the feature map, the more similar between the filter and the corresponding patch in the figure.
Convolution saves the number of parameters
As above left shown, for convolution layers, one feature map has 25 parameters. The total number of parameters is the number of feature maps. However, for MLP, as above right shown, one neuron has 1024 parameters. The total number of parameters is the number of neurons.
1.4 3D convolution
We assume the number of channels in the input is the same as that in the kernel (filter).
As above process shown, we can correlate a 2D feature map in the 3D kernel, then sum over all sections to yield one feature map. This can be realized by flipping the 3D kernel and do 3D convolution.
Note that the number of parameters in this layer is .
2. Pooling layer
Calculation
Divide the convolved features into disjoint regions, and take the mean (or maximum) feature activation over these regions to obtain the pooled features.
Functions of Pooling
- Reduce the number of features for final classification
- Enlarge the effective region of features in the next layer
- A feature learned in the pooled maps will have larger effective regions in the pixel space
- Realize invariance
- After pooling, features tend to be translation invariant in local regions
This process is similar to the receptive fields of visual neurons, whose sizes increase along the visual hierarchy.
3. BP for Conv. layer and Pooling layer
3.1 BP for Conv. layer
Upstream Gradients
If layer is a convolutional layer, consider one single feature map:
Upstream gradients of :
Upstream gradients of :
Similarly we can obtain . The upstream gradients in vector form is
Gradients for weights and biases
3.2 BP for Pooling layer
Average pooling layer
Upstream gradient in the vector form
where
Β
Max pooling layer
Upstream gradient in the vector form
where
1 obtained at the maximal value of
3.3 BP Algorithm summary
For do
- If : or
- If :
- If is a convolutional layer:
- If is a pooling layer (avg. or max.):
( indexes feature map)
- Do weight adjustment:
where denotes the connection weight from node to node and denotes the bias on node (in any feature map)
Note:
- The overall gradients are
- Weight decay is often used:
4. Construction of CNN
The convolutional layers and pooling layers can be combined freely with other layers that we have discussed before: fully connected layer, sigmoid layer, ReLU layer or other activation layers, Euclidean loss layer, Cross-entropy loss layer; and other layers havenβt discussed yet: local response normalization layer, dropout layer, batch normalization layer, etc.
The modules can be stacked in various structures:
A training process for CNN is shown as below
Loading Comments...