Concept

Purpose and Structure of the Feed-Forward Network (FFN) in Transformers

In Transformer models, the Feed-Forward Network (FFN) sub-layer plays a crucial role by introducing non-linearities into the representation learning process. This function is vital for preventing the representations learned by the self-attention mechanism from degenerating. Structurally, a standard FFN consists of two fully connected layers. The first layer typically uses a non-linear activation function like ReLU, while the second is a linear layer.

Image 0

0

1

Updated 2026-04-21

Tags

Data Science

Foundations of Large Language Models Course

Computing Sciences

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models