💡

HW4 Report

1. Baseline Models

In this homework, I construct two baseline models: MLP and CNN, their structures are shown as below
MLP baseline
MLP baseline
 
CNN baseline
CNN baseline
Since there are two tasks (MNIST and Cifar-10) in this homework, when handling different datasets, only the dimension of the first layer’s input of these two baseline models are different.

1.1 MLP & CNN Baseline on MNIST

Each model train for 30 epoches, the training error and valid accuracy are shown as below.
notion image
Test accuracy of baseline models are shown as below
MLP baseline
MLP baseline
CNN baseline
CNN baseline
Note that CNN baseline performs better. The difference between CNN and MLP’s performance is not significant because MNIST is relative easy to handle for both models.

1.2 MLP & CNN Baseline on Cifar-10

notion image
Note that CNN performs bettern than MLP significantly. From above left figure, we knows that both MLP and CNN are over-fitted after about 20 epochs.
Test accuracy of baseline models are shown as below
MLP baseline
MLP baseline
CNN baseline
CNN baseline

2. Using Dropout

2.1 MLP with dropout on MNIST

Add dropout after each FC layer (after activation function). The structure of MLP with dropout is shown as below.
MLP with dropout
MLP with dropout
Training error and test accuracy are shown as below.
notion image
MLP baseline
MLP baseline
MLP with dropout
MLP with dropout
It can be observed, after using dropout, the MLP converges slower than baseline MLP. The final perfomances have no significant difference in this scenario.

2.2 CNN with dropout on Cifar-10

Add dropout after each FC layer (after activation function). The structure of CNN with dropout is shown as below.
CNN with dropout
CNN with dropout
Training error and test accuracy are shown as below
notion image
CNN baseline
CNN baseline
CNN with dropout
CNN with dropout
It can be observed that CNN with dropout has better perfoemance than baseline CNN. And the over-fitting problem is kind of be solved by dropout.

3. Weight Decay

Using new optimizer optim.SGD(model.parameters(), lr=0.01, weight_decay=5e-3)

3.1 MLP with weight decay on MNIST

Training error and test accuracy are shown as below
notion image
MLP baseline
MLP baseline
MLP with weight decay
MLP with weight decay
It can be observed that after using weight decay, the model converges faster. Difference of final performances is not significant.

3.2 CNN with weight decay on Cifar-10

Training error and test accuracy are shown as below
notion image
CNN baseline
CNN baseline
CNN with weight decay
CNN with weight decay
It can be observed that after using weight decay, the over-fitting is kind of handled. Difference of final performances is not significant.

4. CNN with Data Augmentation on Cifar-10

Data augmentation methods include random horizontal flip, random erasing and random crop.
Training error and test accuracy are shown as below
notion image
CNN baseline
CNN baseline
CNN with data autmentation
CNN with data autmentation
It can be observed that although CNN with data autmentation converges slower than CNN baseline, the final performance is significantly better than CNN baseline.

Loading Comments...