Learn Before
In a self-attention mechanism, the output for a specific token is calculated as a weighted sum of 'value' vectors from all tokens in the sequence. If the attention weight connecting a query token to a specific value token is exactly zero, that value token has no contribution to the final output for the query token.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Distributed Computation of Weighted Value Sums
Single-Query Attention Computation with Multiplicative Scaling
Calculating an Attention Output Vector
In a self-attention mechanism, the output for a given input element is a weighted sum of 'value' vectors from all elements in the sequence. Consider the calculation for the word 'sat' in the phrase 'The cat sat on the mat'. If the attention weights from 'sat' to the other words are: 'The': 0.05, 'cat': 0.45, 'sat': 0.05, 'on': 0.0, 'the': 0.0, 'mat': 0.45. Which of the following statements best describes the resulting output vector for 'sat'?
In a self-attention mechanism, the output for a specific token is calculated as a weighted sum of 'value' vectors from all tokens in the sequence. If the attention weight connecting a query token to a specific value token is exactly zero, that value token has no contribution to the final output for the query token.
Sequence Parallelism