1Cademy - GoogLeNet Model Architecture

Learn Before

Inception Network (GoogLeNet)

Concept

GoogLeNet Model Architecture

The GoogLeNet model is constructed from five sequential modules (labeled $b_1$ through $b_5$ ) followed by a fully connected output layer. The overall architecture diagram is shown in Fig. 8.4.2.

Module $b_1$ (Stem): A $7 imes 7$ convolutional layer with $64$ output channels, stride $2$ , and padding $3$ , followed by ReLU activation and a $3 imes 3$ max-pooling layer (stride $2$ , padding $1$ ). This module resembles the stems of AlexNet and LeNet.
Module $b_2$ : A $1 imes 1$ convolution with $64$ channels, then a $3 imes 3$ convolution that triples the channels to $192$ , each followed by ReLU, concluding with $3 imes 3$ max-pooling (stride $2$ , padding $1$ ).
Module $b_3$ : Two Inception blocks producing $64+128+32+32=256$ and $128+192+96+64=480$ output channels respectively, followed by $3 imes 3$ max-pooling.
Module $b_4$ : Five Inception blocks producing $512$ , $512$ , $512$ , $528$ , and $832$ output channels respectively, followed by $3 imes 3$ max-pooling.
Module $b_5$ : Two Inception blocks producing $832$ and $1024$ output channels respectively, followed by global average pooling (reducing each channel to $1 imes 1$ ) and a flatten operation.