Learn Before
GeGLU (GELU-based Gated Linear Unit) Formula
The GeGLU (GELU-based Gated Linear Unit) activation function is defined by the following formula:
In this equation, represents the input, while , and are learnable model parameters (weights and biases). The function is the Gaussian Error Linear Unit (GELU) activation, and signifies the element-wise product.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
GeGLU (GELU-based Gated Linear Unit) Formula
Applications of GeGLU in Large Language Models
An activation function is constructed by taking an input, applying two separate linear transformations to it, and then combining the results. One transformed output is passed through a non-linear 'gating' function, and the result is then multiplied element-wise with the other transformed output. For this entire structure to be correctly identified as a GeGLU, what must be true about the gating function?
Analyzing a Gating Mechanism
Analysis of a Custom Activation Unit
Learn After
An activation function is defined by the formula:
f(x) = GELU(xW₁ + b₁) ⊙ (xW₂ + b₂), wherexis the input,Wandbare learnable parameters,GELUis an activation function, and⊙denotes an element-wise product. Based on this structure, what is the primary purpose of the(xW₂ + b₂)component?GeGLU Activation Calculation
In the GeGLU activation function, defined as
σ_geglu(h) = σ_gelu(hW₁ + b₁) ⊙ (hW₂ + b₂), both of the linear transformations(hW₁ + b₁)and(hW₂ + b₂)are passed through the GELU activation function before the element-wise product is computed.