Learn Before
  • Gated Linear Unit (GLU)

GeGLU (GELU-based Gated Linear Unit)

GeGLU is a specific variant within the Gated Linear Unit (GLU) family of activation functions. It is formed when the internal non-linear activation function, σ(·), in the general GLU structure is defined as the Gaussian Error Linear Unit (GELU) function.

0

1

6 months ago

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
  • Gated Linear Unit (GLU) Formula

  • GeGLU (GELU-based Gated Linear Unit)

  • SwiGLU (Swish-based Gated Linear Unit)

  • Shazeer [2020] on Gated Linear Units

  • Structural Analysis of Gated Linear Units

  • The Gated Linear Unit (GLU) architecture processes an input through two parallel linear transformations. One of these transformed outputs is then passed through a non-linear function before being combined with the other via an element-wise product. What is the analytical purpose of this non-linearly transformed path in the overall mechanism?

  • A standard feed-forward network layer applies a non-linear activation function after a single linear transformation. The Gated Linear Unit (GLU) architecture, however, processes an input through two parallel linear transformations, where one path acts as a 'gate' for the other after being passed through a non-linear function. What is the primary analytical advantage of this gating mechanism compared to using a single, non-gated activation function?

Learn After
  • GeGLU (GELU-based Gated Linear Unit) Formula

  • Applications of GeGLU in Large Language Models

  • An activation function is constructed by taking an input, applying two separate linear transformations to it, and then combining the results. One transformed output is passed through a non-linear 'gating' function, and the result is then multiplied element-wise with the other transformed output. For this entire structure to be correctly identified as a GeGLU, what must be true about the gating function?

  • Analyzing a Gating Mechanism

  • Analysis of a Custom Activation Unit