Multiple Choice

A language model computes its pre-normalized attention scores using the formula: Score = (query_vector ⋅ key_vector + β ⋅ (key_position - query_position)) / sqrt(dimension). In this model, the scalar hyperparameter β is a fixed negative number. Consider a query token at position i=10. How does the bias term β ⋅ (key_position - query_position) influence the scores for a key token at position j=12 compared to a key token at position j=20, assuming all other components of the score are equal for both keys?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science