Learn Before
Concept

ResNet Initial Layers

In the original ResNet architecture designed for ImageNet-scale images, the initial layers consist of a 7imes77 imes 7 convolutional layer with 6464 output channels and a stride of 22, followed by batch normalization, a ReLU activation, and a 3imes33 imes 3 max-pooling layer with a stride of 22. These large receptive fields and aggressive downsampling are appropriate for the 224imes224224 imes 224 pixel inputs common in ImageNet. However, when working with significantly smaller images (e.g., 28imes2828 imes 28 or 96imes9696 imes 96 pixels from datasets like Fashion-MNIST or CIFAR), these initial layers would reduce spatial dimensions too aggressively, leaving insufficient resolution for the subsequent residual blocks to extract meaningful features.

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L