Learn Before
Consider a simplified SwiGLU activation function where the input vector h is [2, 1]. The learnable parameters are defined as follows:
W1 = [[3], [1]],b1 = [0]W2 = [[2], [-1]],b2 = [1]- The Swish activation function is defined as
swish(x) = x * sigmoid(x). - Assume
sigmoid(7) ≈ 0.999.
Given the formula output = swish(hW1 + b1) ⊙ (hW2 + b2), where ⊙ is the element-wise product, calculate the output. Which of the following is the correct result?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Analysis of a Gated Activation Function
Consider a simplified SwiGLU activation function where the input vector
his[2, 1]. The learnable parameters are defined as follows:W1 = [[3], [1]],b1 = [0]W2 = [[2], [-1]],b2 = [1]- The Swish activation function is defined as
swish(x) = x * sigmoid(x). - Assume
sigmoid(7) ≈ 0.999.
Given the formula
output = swish(hW1 + b1) ⊙ (hW2 + b2), where⊙is the element-wise product, calculate the output. Which of the following is the correct result?The SwiGLU activation function is defined by the formula:
σ_swish(hW₁ + b₁) ⊙ (hW₂ + b₂). Match each component of this formula to its primary role in the computation.