Multiple Choice

An attention mechanism calculates normalized weights for a query token against four previous tokens, resulting in weights α₁, α₂, α₃, and α₄. Now, a new version of the mechanism is implemented which is constrained to only consider the second and fourth tokens. The attention scores are re-calculated and re-normalized over just this smaller set, resulting in new weights α'₂ and α'₄. Assuming all original weights were greater than zero, what is the relationship between the new weight for the second token, α'₂, and its original weight, α₂?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science