πŸ’‘

L6. Convolutional Neural Networks

TOC

1. Convolutional Layer

Intro: LeNet5 (LeCun et al. 1998)
notion image
LeNet5 includes two kinds of layers:
  • C layers: convolution
    • Output where is the patch size, is the sigmoid function, are parameters
  • S layers: subsampling (avg pooling)
    • Output where is the pooling size

1.1 Similarity

1D Correlation calculation
Suppose there are two 1D sequences A and B where the length of B is smaller than that of A.
Cosine similarity between two vectors and
if the two vectors have unit length.
Naively, we could slide B on A and calculate the similarity one by one.
notion image
the processs could be slow, we can use GPU because there is no interaction between different calculations.
2D correlation calculation
Suppose there are two 2D images A and B where the size of B is smaller than that of A. The cosine similarity between two matrices and is
if the two matrices have unit Frobenius norm.
Naively, we could slide B on A and calculate the similarity one by one.
notion image
As 1D situation, this process could be slow, we can use GPU.

1.2 1D Convolution

Calculation
Continuous convolution:
Discrete convolution (for finite length sequences)
Note that elements orders are reverse in and .
notion image
notion image
There are three kinds of convolution which yield different output shapes respectively.
  • Valid: length of
notion image
  • Full: length of
notion image
  • Same: length of
notion image
The β€œSame” convolution can also be obtained by β€œvalid” convolution of with zero-padded (since most DL frameworks only implement the β€œvalid” convolution method).
Besides, note that when calculating convolution with stride 1, the output size is .
Relationship between 1D similarity and 1D convolution
Calculating the similarity between sequence and each part of sequence is equivalent to calculating where
The above filp operation can be realized by applying the command numpy.rot90() twice (rot180())or flip the vector along the axis 0
notion image

1.3 2D convolution

Calculation
Suppose that there are two matrices and with sizes and , respectively, where .
Discrete convolution of the two matrices:
For example, when
notion image
There are also three kinds of convolution which yield different output shapes respectively.
  • Valid: shape of
  • Full: shape of
  • Same: shape of
notion image
Similarly, the β€œSame” 2D convolution can be realized by padding using β€œValid” 2D convolution. The shape of is
notion image
Relationship between 2D similarity and 2D convolution
Calculating the similarity between matrix and each part of matrix is equivalent to calcualting where
notion image
The above operation can be realized by applying the command numpy.rot90() twice (rot180()) which is equivalent to flip the matrix along the axes 0 and 1.
notion image
Effect of 2D convolution
notion image
As above figure shown, the higher a pixel value (brighter) in the feature map, the more similar between the filter and the corresponding patch in the figure.
Convolution saves the number of parameters
notion image
As above left shown, for convolution layers, one feature map has 25 parameters. The total number of parameters is the number of feature maps. However, for MLP, as above right shown, one neuron has 1024 parameters. The total number of parameters is the number of neurons.

1.4 3D convolution

We assume the number of channels in the input is the same as that in the kernel (filter).
notion image
As above process shown, we can correlate a 2D feature map in the 3D kernel, then sum over all sections to yield one feature map. This can be realized by flipping the 3D kernel and do 3D convolution.
Note that the number of parameters in this layer is .

2. Pooling layer

Calculation
notion image
Divide the convolved features into disjoint regions, and take the mean (or maximum) feature activation over these regions to obtain the pooled features.
Functions of Pooling
  1. Reduce the number of features for final classification
  1. Enlarge the effective region of features in the next layer
      • A feature learned in the pooled maps will have larger effective regions in the pixel space
  1. Realize invariance
      • After pooling, features tend to be translation invariant in local regions
notion image
This process is similar to the receptive fields of visual neurons, whose sizes increase along the visual hierarchy.

3. BP for Conv. layer and Pooling layer

3.1 BP for Conv. layer

Upstream Gradients
If layer is a convolutional layer, consider one single feature map:
notion image
Upstream gradients of :
Upstream gradients of :
Similarly we can obtain . The upstream gradients in vector form is
Gradients for weights and biases

3.2 BP for Pooling layer

Average pooling layer
notion image
Upstream gradient in the vector form
where
notion image
Β 
Max pooling layer
notion image
Upstream gradient in the vector form
where
1 obtained at the maximal value of

3.3 BP Algorithm summary

For do
  • If : or
  • If :
  • If is a convolutional layer:
  • If is a pooling layer (avg. or max.):
    • ( indexes feature map)
  • Do weight adjustment:
    • where denotes the connection weight from node to node and denotes the bias on node (in any feature map)
Note:
  • The overall gradients are
  • Weight decay is often used:

4. Construction of CNN

The convolutional layers and pooling layers can be combined freely with other layers that we have discussed before: fully connected layer, sigmoid layer, ReLU layer or other activation layers, Euclidean loss layer, Cross-entropy loss layer; and other layers haven’t discussed yet: local response normalization layer, dropout layer, batch normalization layer, etc.
The modules can be stacked in various structures:
notion image
A training process for CNN is shown as below
notion image

Loading Comments...