1Cademy - Gated Linear Unit (GLU)

Learn Before

Activation function of the FFN in transformers

Concept

Gated Linear Unit (GLU)

The Gated Linear Unit (GLU) is a family of activation functions that has gained popularity for its use in Large Language Models (LLMs). The specific variant of a GLU is defined by its internal non-linear activation function, denoted as σ(·). For example, using the GELU function for σ(·) results in GeGLU, and using the Swish function results in SwiGLU.