1Cademy - LeNet-5 Architecture

Learn Before

Classic Convolutional Neural Network Architectures for Object Detection in Images

Concept

LeNet-5 Architecture

LeNet-5 is a pioneering convolutional neural network consisting of two main parts: (i) a convolutional encoder with two convolutional layers, and (ii) a dense block with three fully connected layers. The input is a handwritten digit image, and the output is a probability distribution over $10$ possible classes.

Convolutional encoder: Each convolutional block applies a $5 imes 5$ kernel followed by a sigmoid activation function and a $2 imes 2$ average pooling operation (stride $2$ ). The first convolutional layer produces $6$ output channels, and the second produces $16$ output channels. Each pooling operation reduces spatial dimensionality by a factor of $4$ (halving both height and width). The convolutional block's output shape is (batch size, number of channels, height, width).

Dense block: The four-dimensional output of the convolutional encoder is flattened into a two-dimensional representation where the first dimension indexes examples in the minibatch and the second gives each example's flat vector. Three fully connected layers with $120$ , $84$ , and $10$ outputs respectively form the dense block, with the $10$ -dimensional final layer corresponding to the number of output classes.

Weights are initialized using Xavier initialization.

LeNet-5 remains architecturally meaningful today. When evaluated on Fashion-MNIST, LeNet-5 error rates are much closer to those of advanced architectures such as ResNet (Section 8.6) than to those achievable with basic MLPs (Section 5.2). This demonstrates the substantial leap in performance that convolutional architectures introduced over fully connected networks.

0

1

Updated 2026-05-18

Contributors are: