Applying Scaled Dot-Product Attention
You are given a single query vector, a key matrix, a value matrix, and the vector dimension d. Your task is to compute the final attention output vector by applying the scaled dot-product attention formula. Show the intermediate results for each major step.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Causal Attention
In an attention mechanism, the scores for a query vector
qare calculated by taking its dot product with a set of key vectorsK. These scores are then scaled by a factor related to the vector dimension before being passed to a Softmax function to produce weights. A developer implements this but omits the scaling step, using the formulaSoftmax(q * K^T) * V. What is the most likely adverse effect of this omission, especially when the dimension of the key vectors is large?Calculating Pre-Softmax Attention Scores
Applying Scaled Dot-Product Attention