Learn Before
The SwiGLU activation function is defined by the formula: σ_swish(hW₁ + b₁) ⊙ (hW₂ + b₂). Match each component of this formula to its primary role in the computation.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Analysis of a Gated Activation Function
Consider a simplified SwiGLU activation function where the input vector
his[2, 1]. The learnable parameters are defined as follows:W1 = [[3], [1]],b1 = [0]W2 = [[2], [-1]],b2 = [1]- The Swish activation function is defined as
swish(x) = x * sigmoid(x). - Assume
sigmoid(7) ≈ 0.999.
Given the formula
output = swish(hW1 + b1) ⊙ (hW2 + b2), where⊙is the element-wise product, calculate the output. Which of the following is the correct result?The SwiGLU activation function is defined by the formula:
σ_swish(hW₁ + b₁) ⊙ (hW₂ + b₂). Match each component of this formula to its primary role in the computation.