Learn Before
In a sequence-processing model using an attention mechanism, the model needs to determine which words in an input sentence are most relevant to the current word it is processing. If the 'Key' vectors associated with every word in the input sentence were made identical to each other, what would be the most direct consequence for the attention calculation?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a sequence-processing model using an attention mechanism, the model needs to determine which words in an input sentence are most relevant to the current word it is processing. If the 'Key' vectors associated with every word in the input sentence were made identical to each other, what would be the most direct consequence for the attention calculation?
In an attention mechanism, a model calculates a score between a 'Query' vector and several 'Key' vectors to determine how much attention to pay to different parts of an input. What is the primary function of the 'Key' vector in this comparison process?
Attention Mechanism Troubleshooting