Concept

LeNet-5 Architecture

LeNet-5 is a pioneering convolutional neural network consisting of two main parts: (i) a convolutional encoder with two convolutional layers, and (ii) a dense block with three fully connected layers. The input is a handwritten digit image, and the output is a probability distribution over 1010 possible classes.

Convolutional encoder: Each convolutional block applies a 5imes55 imes 5 kernel followed by a sigmoid activation function and a 2imes22 imes 2 average pooling operation (stride 22). The first convolutional layer produces 66 output channels, and the second produces 1616 output channels. Each pooling operation reduces spatial dimensionality by a factor of 44 (halving both height and width). The convolutional block's output shape is (batch size, number of channels, height, width).

Dense block: The four-dimensional output of the convolutional encoder is flattened into a two-dimensional representation where the first dimension indexes examples in the minibatch and the second gives each example's flat vector. Three fully connected layers with 120120, 8484, and 1010 outputs respectively form the dense block, with the 1010-dimensional final layer corresponding to the number of output classes.

Weights are initialized using Xavier initialization.

LeNet-5 remains architecturally meaningful today. When evaluated on Fashion-MNIST, LeNet-5 error rates are much closer to those of advanced architectures such as ResNet (Section 8.6) than to those achievable with basic MLPs (Section 5.2). This demonstrates the substantial leap in performance that convolutional architectures introduced over fully connected networks.

Image 0

0

1

Updated 2026-05-18

Tags

Data Science

D2L

Dive into Deep Learning @ D2L