In a self-attention mechanism, a bias term is added to the score calculated between a query at position i and a key at position j. Consider a scenario where this bias term is designed to be a large negative value when the distance |i - j| is large, and it approaches zero as the distance gets smaller. How would this specific design influence the model's behavior?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a self-attention mechanism, a bias term is added to the score calculated between a query at position
iand a key at positionj. Consider a scenario where this bias term is designed to be a large negative value when the distance|i - j|is large, and it approaches zero as the distance gets smaller. How would this specific design influence the model's behavior?Diagnosing Model Behavior via Positional Bias
Designing a Positional Bias for a Specific Task