Multiple Choice

In a causal self-attention mechanism, a linear penalty is added to the query-key dot products based on their relative distance. The penalty for a query at position i and a key at position j is calculated as -β * (i - j) where j ≤ i and β is a positive constant. For a query at position 4 (i=4), which of the following lists correctly represents the penalties applied to the keys at positions 0 through 4 (j=0, 1, 2, 3, 4) respectively?

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science