Concept

MLP of the Vision Transformer Encoder

The multilayer perceptron (MLP) within the vision Transformer encoder introduces slight modifications to the positionwise feed-forward network (FFN) of the original Transformer architecture. Primarily, it utilizes the Gaussian Error Linear Unit (GELU) activation function, which serves as a smoother alternative to the standard ReLU. Additionally, to enhance regularization during training, dropout is systematically applied to the output of every fully connected layer within this MLP.

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L