Learn Before
Debugging an Attention Mechanism
An engineer is debugging a machine translation model. They observe that the attention weights correctly highlight the relevant words in the source sentence for generating a specific word in the translation. However, the final output vector, which is a weighted sum of vectors from the source sentence, does not seem to contain meaningful semantic information, leading to poor translation quality. Which of the three primary matrices in the attention mechanism is the most likely source of this problem? Explain your reasoning.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Single-Query Attention Computation with Multiplicative Scaling
Scaled Dot-Product Attention
General Attention Formula
Value Matrix for Causal Attention (V_≤i)
Value Matrix from a Sliding Window
An attention mechanism processes an input sequence of 20 tokens, where each token is represented by a 256-dimensional vector. A Value matrix (V) is generated as part of this process. Which of the following statements most accurately describes the properties and role of this V matrix?
Determining Value Matrix Dimensions
Debugging an Attention Mechanism