Learn Before
Gated Linear Unit (GLU) Formula
The basic form of a Gated Linear Unit (GLU) activation function is expressed by the equation:
In this formulation, and denote the model parameters. The function represents an internal non-linear activation function, where different choices of lead to different versions of GLU functions.

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Gated Linear Unit (GLU) Formula
GeGLU (GELU-based Gated Linear Unit)
SwiGLU (Swish-based Gated Linear Unit)
Shazeer [2020] on Gated Linear Units
Structural Analysis of Gated Linear Units
The Gated Linear Unit (GLU) architecture processes an input through two parallel linear transformations. One of these transformed outputs is then passed through a non-linear function before being combined with the other via an element-wise product. What is the analytical purpose of this non-linearly transformed path in the overall mechanism?
A standard feed-forward network layer applies a non-linear activation function after a single linear transformation. The Gated Linear Unit (GLU) architecture, however, processes an input through two parallel linear transformations, where one path acts as a 'gate' for the other after being passed through a non-linear function. What is the primary analytical advantage of this gating mechanism compared to using a single, non-gated activation function?
Learn After
An activation function is defined by the formula:
Output = σ(Input ⋅ W₁ + b₁) ⊙ (Input ⋅ W₂ + b₂)whereInputis a vector,W₁,W₂,b₁,b₂are learnable parameters,σis a non-linear function (such as the sigmoid function), and⊙denotes the element-wise product. What is the primary functional role of theσ(Input ⋅ W₁ + b₁)component in this architecture?Calculating the Output of a Gated Activation
In the formula for a gated activation, , what is the primary reason that the two resulting vectors, one from each parallel path, must have the same dimensions?