Learn Before
An activation function is defined by the formula: Output = σ(Input ⋅ W₁ + b₁) ⊙ (Input ⋅ W₂ + b₂) where Input is a vector, W₁, W₂, b₁, b₂ are learnable parameters, σ is a non-linear function (such as the sigmoid function), and ⊙ denotes the element-wise product. What is the primary functional role of the σ(Input ⋅ W₁ + b₁) component in this architecture?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An activation function is defined by the formula:
Output = σ(Input ⋅ W₁ + b₁) ⊙ (Input ⋅ W₂ + b₂)whereInputis a vector,W₁,W₂,b₁,b₂are learnable parameters,σis a non-linear function (such as the sigmoid function), and⊙denotes the element-wise product. What is the primary functional role of theσ(Input ⋅ W₁ + b₁)component in this architecture?Calculating the Output of a Gated Activation
In the formula for a gated activation, , what is the primary reason that the two resulting vectors, one from each parallel path, must have the same dimensions?