1Cademy - In a causal self-attention mechanism, a linear relative position bias is added to the attention scores. The bias for a query at position i attending to a key at position j is calculated as `B = -β * (i - j)` for `j ≤ i`, where β is a positive scalar. How would the attention behavior of a model using a large positive β value (e.g., β = 1.0) compare to a model using a small positive β value (e.g., β = 0.1)?

Learn Before

Visual Example of a Linear Relative Position Bias in Causal Attention

Multiple Choice

In a causal self-attention mechanism, a linear relative position bias is added to the attention scores. The bias for a query at position 'i' attending to a key at position 'j' is calculated as B = -β * (i - j) for j ≤ i, where β is a positive scalar. How would the attention behavior of a model using a large positive β value (e.g., β = 1.0) compare to a model using a small positive β value (e.g., β = 0.1)?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related