1Cademy - ResNet Initial Layers

Learn Before

ResNets Convolutional Neural Network

Concept

ResNet Initial Layers

In the original ResNet architecture designed for ImageNet-scale images, the initial layers consist of a $7 imes 7$ convolutional layer with $64$ output channels and a stride of $2$ , followed by batch normalization, a ReLU activation, and a $3 imes 3$ max-pooling layer with a stride of $2$ . These large receptive fields and aggressive downsampling are appropriate for the $224 imes 224$ pixel inputs common in ImageNet. However, when working with significantly smaller images (e.g., $28 imes 28$ or $96 imes 96$ pixels from datasets like Fashion-MNIST or CIFAR), these initial layers would reduce spatial dimensions too aggressively, leaving insufficient resolution for the subsequent residual blocks to extract meaningful features.

Updated 2026-05-18

Contributors are:

Who are from:

References

Dive into Deep Learning
Dive into Deep Learning

Learn After

Modified ResNet-18 for Small Images

Learn Before

Related

Learn After