Learn Before
Example

GoogLeNet Layer-by-Layer Shape Trace

Passing a single-channel 96imes9696 imes 96 image through GoogLeNet produces the following output shapes at each module:

  1. Module b1b_1 (Stem): output 1imes64imes24imes241 imes 64 imes 24 imes 24
  2. Module b2b_2: output 1imes192imes12imes121 imes 192 imes 12 imes 12
  3. Module b3b_3 (2 Inception blocks): output 1imes480imes6imes61 imes 480 imes 6 imes 6
  4. Module b4b_4 (5 Inception blocks): output 1imes832imes3imes31 imes 832 imes 3 imes 3
  5. Module b5b_5 (2 Inception blocks + global avg pool): output 1imes10241 imes 1024
  6. Linear (output layer): output 1imes101 imes 10

The input height and width are reduced from 224224 to 9696 to enable a reasonable training time on Fashion-MNIST. The spatial dimensions are progressively halved by max-pooling between modules (96o24o12o6o3o196 o 24 o 12 o 6 o 3 o 1), while the number of channels grows (64o192o480o832o102464 o 192 o 480 o 832 o 1024). The global average pooling in Module b5b_5 collapses the spatial dimensions to 1imes11 imes 1.

0

1

Updated 2026-05-13

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L