1Cademy - AlexNet Empirical Overfitting Observation

Learn Before

AlexNet Convolutional Neural Network

Concept

AlexNet Empirical Overfitting Observation

A striking empirical observation when training the AlexNet architecture on datasets like Fashion-MNIST is the near absence of overfitting, despite a massive discrepancy between the model's capacity and the dataset size. For example, the final two fully connected layers of AlexNet contain over $40$ million learnable parameters, while the training dataset consists of only 60,000 images. Yet, the training and validation losses remain virtually identical throughout the training process. This remarkable generalization is directly attributed to the improved regularization strategies inherent in the network's design, specifically the use of dropout.

Updated 2026-05-13

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn Before

Related