1Cademy - LeNet-5 Layer-by-Layer Shape Trace

Learn Before

Example

LeNet-5 Layer-by-Layer Shape Trace

Passing a single-channel $28 \times 28$ image through LeNet-5 produces the following shapes at each layer:

Conv2d ( $5 \times 5$ , padding $2$ , $6$ channels): output $1 \times 6 \times 28 \times 28$
Sigmoid: output $1 \times 6 \times 28 \times 28$
AvgPool2d ( $2 \times 2$ , stride $2$ ): output $1 \times 6 \times 14 \times 14$
Conv2d ( $5 \times 5$ , no padding, $16$ channels): output $1 \times 16 \times 10 \times 10$
Sigmoid: output $1 \times 16 \times 10 \times 10$
AvgPool2d ( $2 \times 2$ , stride $2$ ): output $1 \times 16 \times 5 \times 5$
Flatten: output $1 \times 400$
Linear ( $120$ ) + Sigmoid: output $1 \times 120$
Linear ( $84$ ) + Sigmoid: output $1 \times 84$
Linear ( $10$ ): output $1 \times 10$

The first convolutional layer uses $2$ pixels of padding to preserve the spatial dimensions of the $28 \times 28$ input, compensating for the reduction that a $5 \times 5$ kernel would otherwise cause. The second convolutional layer uses no padding, reducing height and width by $4$ pixels each. As the stack deepens, the number of channels increases ( $1 \to 6 \to 16$ ) while spatial dimensions shrink, until flattening yields a $400$ -element vector.

0

1

Updated 2026-06-29

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn Before

Related