1Cademy - A language model computes its pre-normalized attention scores using the formula: `Score = (query_vector ⋅ key_vector + β ⋅ (key_position - query_position)) / sqrt(dimension)`. In this model, the scalar hyperparameter `β` is a fixed negative number. Consider a query token at position `i=10`. How does the bias term `β ⋅ (key_position - query_position)` influence the scores for a key token at position `j=12` compared to a key token at position `j=20`, assuming all other components of the score are equal for both keys?

Learn Before

Complete ALiBi Attention Formula

Multiple Choice

A language model computes its pre-normalized attention scores using the formula: Score = (query_vector ⋅ key_vector + β ⋅ (key_position - query_position)) / sqrt(dimension). In this model, the scalar hyperparameter β is a fixed negative number. Consider a query token at position i=10. How does the bias term β ⋅ (key_position - query_position) influence the scores for a key token at position j=12 compared to a key token at position j=20, assuming all other components of the score are equal for both keys?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related