Learn Before
Concept
Modified ResNet-18 for Small Images
When training on datasets with tiny input images, the standard ResNet-18 architecture must be adapted to prevent excessive spatial downsampling in the early layers. The key modifications compared to the original ResNet (Section 8.6) are:
- Smaller initial convolution: The opening convolutional layer (stride ) is replaced with a convolution using stride and padding , followed by batch normalization and ReLU. This preserves the spatial resolution of the input.
- Max-pooling removed: The max-pooling layer (stride ) that normally follows the initial convolution is omitted entirely.
- Residual body unchanged: The body still consists of four groups of residual blocks with channel progressions , each group containing residual blocks. In every group except the first, the opening block uses a convolution shortcut with stride to halve spatial dimensions and match the increased channel count.
- Output head: A global adaptive average pooling layer collapses the spatial dimensions to , followed by a flatten operation and a single fully connected layer mapping features to the number of output classes.
This modified ResNet-18 serves as a practical toy network for multi-GPU training demonstrations—it is more expressive than LeNet yet remains quick to train.
0
1
Updated 2026-05-22
Tags
D2L
Dive into Deep Learning @ D2L