Learn Before
Concept

AlexNet Empirical Overfitting Observation

A striking empirical observation when training the AlexNet architecture on datasets like Fashion-MNIST is the near absence of overfitting, despite a massive discrepancy between the model's capacity and the dataset size. For example, the final two fully connected layers of AlexNet contain over 4040 million learnable parameters, while the training dataset consists of only 60,00060,000 images. Yet, the training and validation losses remain virtually identical throughout the training process. This remarkable generalization is directly attributed to the improved regularization strategies inherent in the network's design, specifically the use of dropout.

0

1

Updated 2026-05-13

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L