The generic AnyNet design space provides $$17$$ configurable parameters, yielding a vast number of potential network configurations. These parameters determine the specific architecture of the network body and include: the block widths (number of channels) $$c_0, \dots, c_4$$, the depths (number of blocks per stage) $$d_1, \dots, d_4$$, the bottleneck ratios $$k_1, \dots, k_4$$ (where $$k_i \geq 1$$), and the group widths (number of groups for grouped convolutions) $$g_1, \dots, g_4$$.

AnyNet Design Space Parameters

An AnyNet stage is implemented by sequentially stacking a specified depth of ResNeXt blocks. The first block in the sequence uses a spatial stride of $$2$$ and a $$1 \times 1$$ convolution on the residual branch to halve the feature map resolution. The remaining blocks use a default stride of $$1$$, preserving the spatial dimensions.

PyTorch Implementation:
```python
@d2l.add_to_class(AnyNet)
def stage(self, depth, num_channels, groups, bot_mul):
    blk = []
    for i in range(depth):
        if i == 0:
            blk.append(d2l.ResNeXtBlock(num_channels, groups, bot_mul,
                use_1x1conv=True, strides=2))
        else:
            blk.append(d2l.ResNeXtBlock(num_channels, groups, bot_mul))
    return nn.Sequential(*blk)
```

MXNet Implementation:
```python
@d2l.add_to_class(AnyNet)
def stage(self, depth, num_channels, groups, bot_mul):
    net = nn.Sequential()
    for i in range(depth):
        if i == 0:
            net.add(d2l.ResNeXtBlock(
                num_channels, groups, bot_mul, use_1x1conv=True, strides=2))
        else:
            net.add(d2l.ResNeXtBlock(
                num_channels, num_channels, groups, bot_mul))
    return net
```

JAX Implementation:
```python
@d2l.add_to_class(AnyNet)
def stage(self, depth, num_channels, groups, bot_mul):
    blk = []
    for i in range(depth):
        if i == 0:
            blk.append(d2l.ResNeXtBlock(num_channels, groups, bot_mul,
                use_1x1conv=True, strides=(2, 2), training=self.training))
        else:
            blk.append(d2l.ResNeXtBlock(num_channels, groups, bot_mul,
                                        training=self.training))
    return nn.Sequential(blk)
```

TensorFlow Implementation:
```python
@d2l.add_to_class(AnyNet)
def stage(self, depth, num_channels, groups, bot_mul):
    net = tf.keras.models.Sequential()
    for i in range(depth):
        if i == 0:
            net.add(d2l.ResNeXtBlock(num_channels, groups, bot_mul,
                use_1x1conv=True, strides=2))
        else:
            net.add(d2l.ResNeXtBlock(num_channels, groups, bot_mul))
    return net
```

AnyNet Stage Code Implementation

The body of an AnyNet architecture conducts the majority of feature extraction by operating on images at progressively decreasing resolutions. Typically designed to process ImageNet inputs (e.g., reducing a $$224 \times 224 \times 3$$ image to a $$7 \times 7 \times c_4$$ representation), the body is divided into multiple stages (e.g., $$4$$ stages). Each stage quarters the total spatial resolution (by halving both height and width) and is constructed from a sequence of ResNeXt blocks. The initial block in a stage applies a stride of $$2$$ to downsample the feature map, which necessitates a $$1 \times 1$$ convolution on the residual branch to match dimensions, while all subsequent blocks in the stage maintain constant resolution and channel counts.

Claude

The AnyNet design space provides a generic architectural template for constructing and exploring families of convolutional neural networks. This macro-structure is composed of three primary components: a stem that performs the initial image processing, a body that carries out the bulk of the transformations to build object representations across multiple stages, and a head that converts these representations into the final desired outputs (e.g., via a softmax regressor). Within this generic framework, design choices such as stage depth, channel counts, and block structures (often using ResNeXt blocks) parameterize a vast space of potential network configurations.

AnyNet Design Space

Dive into Deep Learning

The stem of an AnyNet architecture serves as the initial processing stage for RGB input images ($$3$$ channels). It typically applies a $$3 	imes 3$$ convolution with a stride of $$2$$, followed by batch normalization. This sequence halves the spatial resolution of the image (from $$r 	imes r$$ to $$r/2 	imes r/2$$) and generates an initial set of $$c_0$$ channels that serve as the input for the network's body.

AnyNet Stem

AnyNet Body

The head of an AnyNet architecture converts the feature representations extracted by the network body into the final desired predictions. It employs an entirely standard design consisting of global average pooling to collapse spatial dimensions, followed by a fully connected layer to emit an $$n$$-dimensional vector for $$n$$-class classification.

Learn Before

Related

Learn After