1Cademy - GoogLeNet Layer-by-Layer Shape Trace

Learn Before

GoogLeNet Model Architecture

Example

GoogLeNet Layer-by-Layer Shape Trace

Passing a single-channel $96 imes 96$ image through GoogLeNet produces the following output shapes at each module:

Module $b_1$ (Stem): output $1 imes 64 imes 24 imes 24$
Module $b_2$ : output $1 imes 192 imes 12 imes 12$
Module $b_3$ (2 Inception blocks): output $1 imes 480 imes 6 imes 6$
Module $b_4$ (5 Inception blocks): output $1 imes 832 imes 3 imes 3$
Module $b_5$ (2 Inception blocks + global avg pool): output $1 imes 1024$
Linear (output layer): output $1 imes 10$

The input height and width are reduced from $224$ to $96$ to enable a reasonable training time on Fashion-MNIST. The spatial dimensions are progressively halved by max-pooling between modules ( $96 o 24 o 12 o 6 o 3 o 1$ ), while the number of channels grows ( $64 o 192 o 480 o 832 o 1024$ ). The global average pooling in Module $b_5$ collapses the spatial dimensions to $1 imes 1$ .

0

1

Updated 2026-05-13

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn Before

Related