Short Answer

Calculating Memory Growth for Token Caching

An autoregressive model uses a mechanism to store key and value vectors for previously processed tokens to speed up inference. For a sequence of 100 tokens, this storage mechanism consumes 200 MB of memory. Assuming all other model parameters remain constant, how much memory would this mechanism consume for a sequence of 400 tokens? Explain your reasoning.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science