Learn Before
Concept

VGG-11 Layer-by-Layer Shape Trace

We can trace the dimensionality transformations of an input image (e.g., with a spatial shape of 224×224224 \times 224) as it passes through the VGG-11 network. The architecture halves the spatial height and width at each of the five VGG blocks due to the max-pooling operations. The resolution systematically drops from 224×224224 \times 224 to 112×112112 \times 112, 56×5656 \times 56, 28×2828 \times 28, 14×1414 \times 14, and finally reaches 7×77 \times 7. Meanwhile, the number of channels progressively expands up to 512512. The resulting 512×7×7512 \times 7 \times 7 feature map is then flattened into a 2508825088-dimensional representation before being fed into the fully connected dense layers.

0

1

Updated 2026-05-13

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L