LeNet-5 Architecture
LeNet-5 is a pioneering convolutional neural network consisting of two main parts: (i) a convolutional encoder with two convolutional layers, and (ii) a dense block with three fully connected layers. The input is a handwritten digit image, and the output is a probability distribution over possible classes.
Convolutional encoder: Each convolutional block applies a kernel followed by a sigmoid activation function and a average pooling operation (stride ). The first convolutional layer produces output channels, and the second produces output channels. Each pooling operation reduces spatial dimensionality by a factor of (halving both height and width). The convolutional block's output shape is (batch size, number of channels, height, width).
Dense block: The four-dimensional output of the convolutional encoder is flattened into a two-dimensional representation where the first dimension indexes examples in the minibatch and the second gives each example's flat vector. Three fully connected layers with , , and outputs respectively form the dense block, with the -dimensional final layer corresponding to the number of output classes.
Weights are initialized using Xavier initialization.
LeNet-5 remains architecturally meaningful today. When evaluated on Fashion-MNIST, LeNet-5 error rates are much closer to those of advanced architectures such as ResNet (Section 8.6) than to those achievable with basic MLPs (Section 5.2). This demonstrates the substantial leap in performance that convolutional architectures introduced over fully connected networks.
0
1
Contributors are:
Who are from:
Tags
Data Science
D2L
Dive into Deep Learning @ D2L