1Cademy - In a standard attention mechanism, an attention score is computed from a query vector (q) and a key vector (k). Consider a modification where a learnable scalar bias is added directly to the query-key dot product *before* the result is scaled and passed through a Softmax function. The value of this bias is determined solely by the relative distance between the query and key. How does this specific modification influence the attention mechanisms behavior?

Learn Before

Formula for Applying T5 Relative Position Bias

Multiple Choice

In a standard attention mechanism, an attention score is computed from a query vector (q) and a key vector (k). Consider a modification where a learnable scalar bias is added directly to the query-key dot product before the result is scaled and passed through a Softmax function. The value of this bias is determined solely by the relative distance between the query and key. How does this specific modification influence the attention mechanism's behavior?

Updated 2025-10-04

Contributors are:

Who are from:

Learn Before

Related