Learn Before
Short Answer

Selective Computation in Optimized Attention

An optimized attention mechanism is calculating the output for a token at position i=5. The set of indices designated to contribute to this calculation is G = {1, 2, 4}. Explain why the value vector for the token at position j=3, denoted as v_3, is not included in the final weighted sum.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science