Short Answer

Interaction of Semantic and Positional Scores

In a system that calculates a pre-Softmax attention score by adding a linear positional bias to the scaled query-key dot product, describe a scenario where a key that is semantically less similar to a query (i.e., has a lower dot-product score) could receive a higher final attention score than a key that is semantically more similar. Explain your reasoning by referencing the components of the calculation.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science