Learn Before
Concept

Modified ResNet-18 for Small Images

When training on datasets with tiny input images, the standard ResNet-18 architecture must be adapted to prevent excessive spatial downsampling in the early layers. The key modifications compared to the original ResNet (Section 8.6) are:

  1. Smaller initial convolution: The opening 7imes77 imes 7 convolutional layer (stride 22) is replaced with a 3imes33 imes 3 convolution using stride 11 and padding 11, followed by batch normalization and ReLU. This preserves the spatial resolution of the input.
  2. Max-pooling removed: The 3imes33 imes 3 max-pooling layer (stride 22) that normally follows the initial convolution is omitted entirely.
  3. Residual body unchanged: The body still consists of four groups of residual blocks with channel progressions 64o128o256o51264 o 128 o 256 o 512, each group containing 22 residual blocks. In every group except the first, the opening block uses a 1imes11 imes 1 convolution shortcut with stride 22 to halve spatial dimensions and match the increased channel count.
  4. Output head: A global adaptive average pooling layer collapses the spatial dimensions to 1imes11 imes 1, followed by a flatten operation and a single fully connected layer mapping 512512 features to the number of output classes.

This modified ResNet-18 serves as a practical toy network for multi-GPU training demonstrations—it is more expressive than LeNet yet remains quick to train.

0

1

Updated 2026-05-22

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L