Concept

Activation function of the FFN in transformers

Vanilla transformers use ReLU activation. Other functions used as activation functions include:

  • Swish function -> f(x) = xsigmoid(β\beta x)
  • Gaussian Error Linear Unit (GELU)
  • Gated Linear Units (GLU)

0

1

Updated 2025-10-06

Tags

Data Science

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences