Learn Before
Example

LeNet-5 Layer-by-Layer Shape Trace

Passing a single-channel 28imes2828 imes 28 image through LeNet-5 produces the following shapes at each layer:

  1. Conv2d (5imes55 imes 5, padding 22, 66 channels): output 1imes6imes28imes281 imes 6 imes 28 imes 28
  2. Sigmoid: output 1imes6imes28imes281 imes 6 imes 28 imes 28
  3. AvgPool2d (2imes22 imes 2, stride 22): output 1imes6imes14imes141 imes 6 imes 14 imes 14
  4. Conv2d (5imes55 imes 5, no padding, 1616 channels): output 1imes16imes10imes101 imes 16 imes 10 imes 10
  5. Sigmoid: output 1imes16imes10imes101 imes 16 imes 10 imes 10
  6. AvgPool2d (2imes22 imes 2, stride 22): output 1imes16imes5imes51 imes 16 imes 5 imes 5
  7. Flatten: output 1imes4001 imes 400
  8. Linear (120120) + Sigmoid: output 1imes1201 imes 120
  9. Linear (8484) + Sigmoid: output 1imes841 imes 84
  10. Linear (1010): output 1imes101 imes 10

The first convolutional layer uses 22 pixels of padding to preserve the spatial dimensions of the 28imes2828 imes 28 input, compensating for the reduction that a 5imes55 imes 5 kernel would otherwise cause. The second convolutional layer uses no padding, reducing height and width by 44 pixels each. As the stack deepens, the number of channels increases (1o6o161 o 6 o 16) while spatial dimensions shrink, until flattening yields a 400400-element vector.

Image 0

0

1

Updated 2026-05-12

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L