1Cademy - In a self-attention mechanism, a bias term is added to the score calculated between a query at position `i` and a key at position `j`. Consider a scenario where this bias term is designed to be a large negative value when the distance `|i - j|` is large, and it approaches zero as the distance gets smaller. How would this specific design influence the models behavior?

Learn Before

Interpretation of Positional Bias as a Distance Penalty

Multiple Choice

In a self-attention mechanism, a bias term is added to the score calculated between a query at position i and a key at position j. Consider a scenario where this bias term is designed to be a large negative value when the distance |i - j| is large, and it approaches zero as the distance gets smaller. How would this specific design influence the model's behavior?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related