Learn Before
Interaction of Semantic and Positional Scores
In a system that calculates a pre-Softmax attention score by adding a linear positional bias to the scaled query-key dot product, describe a scenario where a key that is semantically less similar to a query (i.e., has a lower dot-product score) could receive a higher final attention score than a key that is semantically more similar. Explain your reasoning by referencing the components of the calculation.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Complete ALiBi Attention Formula
Calculating a Pre-Softmax Attention Score with Linear Bias
In a model that adds a linear positional bias to its attention calculation, a query at position
i=10attends to two keys at positionsj1=5andj2=2. Assuming the scaled dot-product portion of the score is identical for both keys, how will the addition of the positional bias termPE(i, j)affect their final pre-Softmax attention scores?Interaction of Semantic and Positional Scores