Case Study

Computational Advantage of State Variables

In a specific variant of an attention mechanism, the entire history of key-value pairs up to a position i is summarized by two cumulative state variables: a matrix μ_i (sum of outer products of keys and values) and a vector ν_i (sum of keys). This allows the calculation for the current step to be performed using only the state from the previous step and the current key-value pair, without re-accessing the full history.

Analyze the following two scenarios and determine in which one this summarization method offers a more significant computational advantage compared to a standard attention mechanism that re-scans the entire history at each step. Justify your reasoning.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science