1Cademy - MLP of the Vision Transformer Encoder

Learn Before

Purpose and Structure of the Feed-Forward Network (FFN) in Transformers

Concept

MLP of the Vision Transformer Encoder

The multilayer perceptron (MLP) within the vision Transformer encoder introduces slight modifications to the positionwise feed-forward network (FFN) of the original Transformer architecture. Primarily, it utilizes the Gaussian Error Linear Unit (GELU) activation function, which serves as a smoother alternative to the standard ReLU. Additionally, to enhance regularization during training, dropout is systematically applied to the output of every fully connected layer within this MLP.

Updated 2026-05-15

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn After

Implementation of MLP in Vision Transformers

Learn Before

Related

Learn After