GeGLU Activation Calculation
You are debugging a neural network layer that uses the GeGLU activation function, defined as: GeGLU(h) = GELU(hW₁ + b₁) ⊙ (hW₂ + b₂), where ⊙ represents the element-wise product. Given the case details below, calculate the final output value. Show the results of the two intermediate linear transformations before the final calculation.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An activation function is defined by the formula:
f(x) = GELU(xW₁ + b₁) ⊙ (xW₂ + b₂), wherexis the input,Wandbare learnable parameters,GELUis an activation function, and⊙denotes an element-wise product. Based on this structure, what is the primary purpose of the(xW₂ + b₂)component?GeGLU Activation Calculation
In the GeGLU activation function, defined as
σ_geglu(h) = σ_gelu(hW₁ + b₁) ⊙ (hW₂ + b₂), both of the linear transformations(hW₁ + b₁)and(hW₂ + b₂)are passed through the GELU activation function before the element-wise product is computed.