Short Answer

KV Cache Memory Calculation

An autoregressive language model uses a cache to store key and value vectors for each token in the context window, across all attention heads and layers. This speeds up the generation of subsequent tokens. Given the following parameters for a specific model, calculate the total number of individual floating-point values that would be stored in this cache when it is completely full.

  • Number of layers: 32
  • Number of attention heads per layer: 12
  • Dimensionality of each head's key/value vector: 64
  • Context window size: 2048 tokens

Provide only the final numerical answer.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science