1Cademy - An attention mechanism calculates normalized weights for a query token against four previous tokens, resulting in weights α₁, α₂, α₃, and α₄. Now, a new version of the mechanism is implemented which is constrained to only consider the second and fourth tokens. The attention scores are re-calculated and re-normalized over just this smaller set, resulting in new weights α₂ and α₄. Assuming all original weights were greater than zero, what is the relationship between the new weight for the second token, α₂, and its original weight, α₂?

Learn Before

Comparison of Sparse and Dense Attention Weights

Multiple Choice

An attention mechanism calculates normalized weights for a query token against four previous tokens, resulting in weights α₁, α₂, α₃, and α₄. Now, a new version of the mechanism is implemented which is constrained to only consider the second and fourth tokens. The attention scores are re-calculated and re-normalized over just this smaller set, resulting in new weights α'₂ and α'₄. Assuming all original weights were greater than zero, what is the relationship between the new weight for the second token, α'₂, and its original weight, α₂?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related