Learn Before
A language model architecture combines information from two sources: an 'immediate context' output and a 'retrieved knowledge' output. It uses a learned gating vector, g, to dynamically weigh these sources. The final output is calculated using the formula: Output = g ⊙ [immediate_context_output] + (1 - g) ⊙ [retrieved_knowledge_output], where ⊙ is element-wise multiplication. If, during a specific task, the values in the gating vector g are consistently close to 0.0, what does this imply about the model's behavior for that task?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model architecture combines information from two sources: an 'immediate context' output and a 'retrieved knowledge' output. It uses a learned gating vector,
g, to dynamically weigh these sources. The final output is calculated using the formula:Output = g ⊙ [immediate_context_output] + (1 - g) ⊙ [retrieved_knowledge_output], where⊙is element-wise multiplication. If, during a specific task, the values in the gating vectorgare consistently close to 0.0, what does this imply about the model's behavior for that task?Advantage of a Learned Gating Mechanism
Calculating a Gated Attention Output