1Cademy - An auto-regressive language model is generating a long text, one token at a time. To manage memory, it employs a key-value caching strategy where it only stores the keys and values for the most recent 2048 tokens. How will the memory allocated for this cache change as the model generates the 5000th token and continues beyond it?

Learn Before

Reducing KV Cache Complexity via Windowed Caching

Multiple Choice

An auto-regressive language model is generating a long text, one token at a time. To manage memory, it employs a key-value caching strategy where it only stores the keys and values for the most recent 2048 tokens. How will the memory allocated for this cache change as the model generates the 5000th token and continues beyond it?

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related