Learn Before
In the GeGLU activation function, defined as σ_geglu(h) = σ_gelu(hW₁ + b₁) ⊙ (hW₂ + b₂), both of the linear transformations (hW₁ + b₁) and (hW₂ + b₂) are passed through the GELU activation function before the element-wise product is computed.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An activation function is defined by the formula:
f(x) = GELU(xW₁ + b₁) ⊙ (xW₂ + b₂), wherexis the input,Wandbare learnable parameters,GELUis an activation function, and⊙denotes an element-wise product. Based on this structure, what is the primary purpose of the(xW₂ + b₂)component?GeGLU Activation Calculation
In the GeGLU activation function, defined as
σ_geglu(h) = σ_gelu(hW₁ + b₁) ⊙ (hW₂ + b₂), both of the linear transformations(hW₁ + b₁)and(hW₂ + b₂)are passed through the GELU activation function before the element-wise product is computed.