1. Baseline Models
In this homework, I construct two baseline models: MLP and CNN, their structures are shown as below
ย
Since there are two tasks (MNIST and Cifar-10) in this homework, when handling different datasets, only the dimension of the first layerโs input of these two baseline models are different.
1.1 MLP & CNN Baseline on MNIST
Each model train for 30 epoches, the training error and valid accuracy are shown as below.
Test accuracy of baseline models are shown as below
Note that CNN baseline performs better. The difference between CNN and MLPโs performance is not significant because MNIST is relative easy to handle for both models.
1.2 MLP & CNN Baseline on Cifar-10
Note that CNN performs bettern than MLP significantly. From above left figure, we knows that both MLP and CNN are over-fitted after about 20 epochs.
Test accuracy of baseline models are shown as below
2. Using Dropout
2.1 MLP with dropout on MNIST
Add dropout after each FC layer (after activation function). The structure of MLP with dropout is shown as below.
Training error and test accuracy are shown as below.
It can be observed, after using dropout, the MLP converges slower than baseline MLP. The final perfomances have no significant difference in this scenario.
2.2 CNN with dropout on Cifar-10
Add dropout after each FC layer (after activation function). The structure of CNN with dropout is shown as below.
Training error and test accuracy are shown as below
It can be observed that CNN with dropout has better perfoemance than baseline CNN. And the over-fitting problem is kind of be solved by dropout.
3. Weight Decay
Using new optimizer
optim.SGD(model.parameters(), lr=0.01, weight_decay=5e-3)
3.1 MLP with weight decay on MNIST
Training error and test accuracy are shown as below
It can be observed that after using weight decay, the model converges faster. Difference of final performances is not significant.
3.2 CNN with weight decay on Cifar-10
Training error and test accuracy are shown as below
It can be observed that after using weight decay, the over-fitting is kind of handled. Difference of final performances is not significant.
4. CNN with Data Augmentation on Cifar-10
Data augmentation methods include random horizontal flip, random erasing and random crop.
Training error and test accuracy are shown as below
It can be observed that although CNN with data autmentation converges slower than CNN baseline, the final performance is significantly better than CNN baseline.
Loading Comments...