Learn Before
Interpreting an Attention Weight Matrix
Consider the sentence: 'The delivery robot dropped the package because it was faulty.' An attention mechanism processes this sequence. The table below shows the calculated attention weights in the row corresponding to the query word 'it', indicating how much 'it' attends to every other word in the sequence.
| Attending To -> | The | delivery | robot | dropped | the | package | because | it | was | faulty |
|---|---|---|---|---|---|---|---|---|---|---|
| Query: 'it' | 0.05 | 0.05 | 0.70 | 0.05 | 0.05 | 0.08 | 0.01 | 0.00 | 0.005 | 0.005 |
Based on this data, which word does 'it' most likely refer to? Justify your answer by explaining what the distribution of these weights signifies about the relationships the model has identified.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Causal Attention Weight Matrix Calculation
An attention mechanism processes the input sequence:
['The', 'robot', 'grasped', 'the', 'wrench']. The attention weight matrix is calculated to determine the contextual importance of each word. The row in the matrix corresponding to the word 'grasped' has the highest weight value in the column corresponding to the word 'wrench'. What does this high weight signify?Interpreting an Attention Weight Matrix
In an attention mechanism processing a sequence of
mitems, anm x mattention weight matrix is generated. What does thei-th row of this matrix fundamentally represent?Query-Key-Value Attention Output Matrix Product