1Cademy - Purpose and Structure of the Feed-Forward Network (FFN) in Transformers

Learn Before

Core Function $F(\cdot)$ in Transformer Sub-layers
Transformer Architecture Overview

Concept

Purpose and Structure of the Feed-Forward Network (FFN) in Transformers

In Transformer models, the Feed-Forward Network (FFN) sub-layer plays a crucial role by introducing non-linearities into the representation learning process. This function is vital for preventing the representations learned by the self-attention mechanism from degenerating. Structurally, a standard FFN consists of two fully connected layers. The first layer typically uses a non-linear activation function like ReLU, while the second is a linear layer.