Batch normalization can be integrated into the classic LeNet-5 architecture by inserting a batch normalization layer after each convolutional or fully connected layer but before the corresponding activation function. The resulting network, sometimes called BNLeNet, retains the same layer progression as the original LeNet—two convolutional blocks (each followed by sigmoid activation and average pooling) and three fully connected layers with $$120$$, $$84$$, and $$10$$ output units—but places a batch normalization operation at every layer that produces learnable features. For the convolutional layers, the batch normalization operates in four-dimensional mode (over the channel dimension across all spatial locations), while for the fully connected layers it operates in two-dimensional mode (over the feature dimension). This architecture can be implemented either from scratch using a custom batch normalization class, or concisely using built-in high-level API layers (such as nn.LazyBatchNorm2d and nn.LazyBatchNorm1d in PyTorch). The concise version produces virtually identical code but eliminates the need to manually specify dimensionality arguments. Training this modified network on the Fashion-MNIST dataset with a batch size of $$128$$ and a learning rate of $$0.1$$ for $$10$$ epochs demonstrates how batch normalization integrates seamlessly into existing architectures without requiring significant changes to the training pipeline.

Applying Batch Normalization to LeNet

A complete batch normalization layer is designed by separating the core mathematical operations from framework-specific boilerplate code. A custom neural network layer handles bookkeeping tasks: it allocates learnable model parameters (a scale vector $$\boldsymbol{\gamma}$$ initialized to $$1$$ and a shift vector $$\boldsymbol{\beta}$$ initialized to $$0$$) alongside non-model variables that track dataset statistics (a moving mean initialized to $$0$$ and a moving variance initialized to $$1$$). During the forward pass, this custom layer manages the device context, invokes the core batch normalization mathematics, and continuously tracks the updated moving averages.

Claude

In deep learning frameworks, practitioners can define custom layers that include their own learnable parameters, such as weights and biases. These parameters can be adjusted and optimized through the standard training process. By implementing these layers via the framework's basic layer class, it is possible to design flexible new layers with specific mathematical transformations that behave differently from any existing built-in layers in the library.

Implementing Custom Layers with Parameters

Dive into Deep Learning

When defining custom layers with learnable parameters, the parameters should be instantiated using the deep learning framework's built-in parameter functions. Using these native functions provides essential housekeeping functionality: they automatically manage parameter access, initialization, sharing, and model serialization (saving and loading). This ensures that developers do not need to write custom serialization routines for every new parameterized layer they create.

Parameter Management in Custom Layers

A practical example of a parameterized custom layer is a user-defined fully connected layer. To implement this, the layer requires two arguments denoting the number of inputs and outputs, and it uses the framework's parameter functions to initialize a weight matrix and a bias vector. During the forward propagation step, the layer computes the matrix multiplication of the input tensor and the weights, adds the bias term, and typically applies a default activation function (such as ReLU) before returning the result.

Custom Fully Connected Layer Example

Once a custom layer—whether parameterless or equipped with learnable parameters—is defined, it can be seamlessly integrated as a standard component within more complex neural network architectures. For instance, it can be added to a sequential model alongside standard built-in layers. In this configuration, the custom layer receives the output from the preceding component, applies its predefined transformation during the forward pass, and passes the resulting activations to the subsequent layers in the sequence.

Learn Before

Related

Learn After