Multiple Choice

In a self-attention mechanism, a bias term is added to the score calculated between a query at position i and a key at position j. Consider a scenario where this bias term is designed to be a large negative value when the distance |i - j| is large, and it approaches zero as the distance gets smaller. How would this specific design influence the model's behavior?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science