Learn Before
Concept

AnyNet Body

The body of an AnyNet architecture conducts the majority of feature extraction by operating on images at progressively decreasing resolutions. Typically designed to process ImageNet inputs (e.g., reducing a 224×224×3224 \times 224 \times 3 image to a 7×7×c47 \times 7 \times c_4 representation), the body is divided into multiple stages (e.g., 44 stages). Each stage quarters the total spatial resolution (by halving both height and width) and is constructed from a sequence of ResNeXt blocks. The initial block in a stage applies a stride of 22 to downsample the feature map, which necessitates a 1×11 \times 1 convolution on the residual branch to match dimensions, while all subsequent blocks in the stage maintain constant resolution and channel counts.

0

1

Updated 2026-05-13

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L

Related