Learn Before
Concept

GoogLeNet Model Architecture

The GoogLeNet model is constructed from five sequential modules (labeled b1b_1 through b5b_5) followed by a fully connected output layer. The overall architecture diagram is shown in Fig. 8.4.2.

  • Module b1b_1 (Stem): A 7imes77 imes 7 convolutional layer with 6464 output channels, stride 22, and padding 33, followed by ReLU activation and a 3imes33 imes 3 max-pooling layer (stride 22, padding 11). This module resembles the stems of AlexNet and LeNet.
  • Module b2b_2: A 1imes11 imes 1 convolution with 6464 channels, then a 3imes33 imes 3 convolution that triples the channels to 192192, each followed by ReLU, concluding with 3imes33 imes 3 max-pooling (stride 22, padding 11).
  • Module b3b_3: Two Inception blocks producing 64+128+32+32=25664+128+32+32=256 and 128+192+96+64=480128+192+96+64=480 output channels respectively, followed by 3imes33 imes 3 max-pooling.
  • Module b4b_4: Five Inception blocks producing 512512, 512512, 512512, 528528, and 832832 output channels respectively, followed by 3imes33 imes 3 max-pooling.
  • Module b5b_5: Two Inception blocks producing 832832 and 10241024 output channels respectively, followed by global average pooling (reducing each channel to 1imes11 imes 1) and a flatten operation.

Finally, a fully connected layer maps the 10241024-dimensional representation to the number of output classes.

Image 0

0

1

Updated 2026-05-13

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L