1Cademy - Modified ResNet-18 for Small Images

Learn Before

ResNet Initial Layers

Concept

Modified ResNet-18 for Small Images

When training on datasets with tiny input images, the standard ResNet-18 architecture must be adapted to prevent excessive spatial downsampling in the early layers. The key modifications compared to the original ResNet (Section 8.6) are:

Smaller initial convolution: The opening $7 imes 7$ convolutional layer (stride $2$ ) is replaced with a $3 imes 3$ convolution using stride $1$ and padding $1$ , followed by batch normalization and ReLU. This preserves the spatial resolution of the input.
Max-pooling removed: The $3 imes 3$ max-pooling layer (stride $2$ ) that normally follows the initial convolution is omitted entirely.
Residual body unchanged: The body still consists of four groups of residual blocks with channel progressions $64 o 128 o 256 o 512$ , each group containing $2$ residual blocks. In every group except the first, the opening block uses a $1 imes 1$ convolution shortcut with stride $2$ to halve spatial dimensions and match the increased channel count.
Output head: A global adaptive average pooling layer collapses the spatial dimensions to $1 imes 1$ , followed by a flatten operation and a single fully connected layer mapping $512$ features to the number of output classes.

This modified ResNet-18 serves as a practical toy network for multi-GPU training demonstrations—it is more expressive than LeNet yet remains quick to train.

Updated 2026-05-22

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn After

CIFAR-10 ResNet-18 Classification Architecture

Learn Before

Related

Learn After