1Cademy - Calculating Memory Growth for Token Caching

Learn Before

Space Complexity of the KV Cache

Short Answer

Calculating Memory Growth for Token Caching

An autoregressive model uses a mechanism to store key and value vectors for previously processed tokens to speed up inference. For a sequence of 100 tokens, this storage mechanism consumes 200 MB of memory. Assuming all other model parameters remain constant, how much memory would this mechanism consume for a sequence of 400 tokens? Explain your reasoning.

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related