Learn Before
Example
GoogLeNet Layer-by-Layer Shape Trace
Passing a single-channel image through GoogLeNet produces the following output shapes at each module:
- Module (Stem): output
- Module : output
- Module (2 Inception blocks): output
- Module (5 Inception blocks): output
- Module (2 Inception blocks + global avg pool): output
- Linear (output layer): output
The input height and width are reduced from to to enable a reasonable training time on Fashion-MNIST. The spatial dimensions are progressively halved by max-pooling between modules (), while the number of channels grows (). The global average pooling in Module collapses the spatial dimensions to .
0
1
Updated 2026-05-13
Tags
D2L
Dive into Deep Learning @ D2L