Learn Before
Formula

Gated Linear Unit (GLU) Formula

The basic form of a Gated Linear Unit (GLU) activation function is expressed by the equation:

σglu(h)=σ(hW1+b1)(W2+b2)\sigma_{\mathrm{glu}}(\mathbf{h}) = \sigma(\mathbf{h}\mathbf{W}_1 + \mathbf{b}_1) \odot (\mathbf{W}_2 + \mathbf{b}_2)

In this formulation, W1,W2Rd×d\mathbf{W}_1, \mathbf{W}_2 \in \mathbb{R}^{d \times d} and b1,b2Rd\mathbf{b}_1, \mathbf{b}_2 \in \mathbb{R}^{d} denote the model parameters. The function σ()\sigma(\cdot) represents an internal non-linear activation function, where different choices of σ()\sigma(\cdot) lead to different versions of GLU functions.

Image 0

0

1

Updated 2026-04-21

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences