SwiGLU (Swish-based Gated Linear Unit)
SwiGLU is a variant within the Gated Linear Unit (GLU) family of activation functions. It is formed by using the Swish function as the internal non-linear activation, denoted as σ(·) in the general GLU formula.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Gated Linear Unit (GLU) Formula
GeGLU (GELU-based Gated Linear Unit)
SwiGLU (Swish-based Gated Linear Unit)
Shazeer [2020] on Gated Linear Units
Structural Analysis of Gated Linear Units
The Gated Linear Unit (GLU) architecture processes an input through two parallel linear transformations. One of these transformed outputs is then passed through a non-linear function before being combined with the other via an element-wise product. What is the analytical purpose of this non-linearly transformed path in the overall mechanism?
A standard feed-forward network layer applies a non-linear activation function after a single linear transformation. The Gated Linear Unit (GLU) architecture, however, processes an input through two parallel linear transformations, where one path acts as a 'gate' for the other after being passed through a non-linear function. What is the primary analytical advantage of this gating mechanism compared to using a single, non-gated activation function?