Calculating Linear Relative Position Bias
In a causal self-attention mechanism, a bias is added to the attention score between a query at position i and a key at position j. This bias is calculated using the formula Bias = -β * (i - j) for j ≤ i. Given a learnable parameter β = 0.25, calculate the bias value that would be added to the attention score when a query at position 8 attends to a key at position 2.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a causal self-attention mechanism, a linear relative position bias is added to the attention scores. The bias for a query at position 'i' attending to a key at position 'j' is calculated as
B = -β * (i - j)forj ≤ i, where β is a positive scalar. How would the attention behavior of a model using a large positive β value (e.g., β = 1.0) compare to a model using a small positive β value (e.g., β = 0.1)?Calculating Linear Relative Position Bias
In a causal self-attention mechanism, a linear penalty is added to the query-key dot products based on their relative distance. The penalty for a query at position
iand a key at positionjis calculated as-β * (i - j)wherej ≤ iandβis a positive constant. For a query at position 4 (i=4), which of the following lists correctly represents the penalties applied to the keys at positions 0 through 4 (j=0, 1, 2, 3, 4) respectively?