Learn Before
Concept
AlexNet Empirical Overfitting Observation
A striking empirical observation when training the AlexNet architecture on datasets like Fashion-MNIST is the near absence of overfitting, despite a massive discrepancy between the model's capacity and the dataset size. For example, the final two fully connected layers of AlexNet contain over million learnable parameters, while the training dataset consists of only images. Yet, the training and validation losses remain virtually identical throughout the training process. This remarkable generalization is directly attributed to the improved regularization strategies inherent in the network's design, specifically the use of dropout.
0
1
Updated 2026-05-13
Tags
D2L
Dive into Deep Learning @ D2L