1Cademy - Gated Linear Unit (GLU) Formula

Learn Before

Gated Linear Unit (GLU)

Formula

Gated Linear Unit (GLU) Formula

The basic form of a Gated Linear Unit (GLU) activation function is expressed by the equation:

$\sigma_{\mathrm{glu}}(\mathbf{h}) = \sigma(\mathbf{h}\mathbf{W}_1 + \mathbf{b}_1) \odot (\mathbf{W}_2 + \mathbf{b}_2)$

In this formulation, $\mathbf{W}_1, \mathbf{W}_2 \in \mathbb{R}^{d \times d}$ and $\mathbf{b}_1, \mathbf{b}_2 \in \mathbb{R}^{d}$ denote the model parameters. The function $\sigma(\cdot)$ represents an internal non-linear activation function, where different choices of $\sigma(\cdot)$ lead to different versions of GLU functions.