1Cademy - Trade-offs in Attention Mechanisms

Learn Before

Sparse Attention with a Fixed Key-Value Subset

Short Answer

Trade-offs in Attention Mechanisms

An engineer proposes a modification to a standard autoregressive language model's attention mechanism. In this new design, for any given token being generated at position i, its query vector q_i will only calculate attention scores with the key-value pairs from the very first position (k_1, v_1) and its own position (k_i, v_i), instead of all preceding positions. Evaluate the primary advantage and the most significant potential disadvantage of this proposed modification.

Updated 2025-10-04

Contributors are:

Who are from:

Learn Before

Related