1Cademy - State of an Autoregressive Cache

Learn Before

Set of Sequential Key-Value Pairs

Case Study

State of an Autoregressive Cache

An autoregressive language model generates text one token at a time. To generate the next token in a sequence, it must consider all the tokens that came before it. To make this process efficient, the model maintains a 'cache' of key and value vectors for every token it has already produced.

Given this mechanism, if the model has just finished generating the sequence 'The quick brown', what information does this cache now hold in preparation for generating the next token?

Updated 2025-10-03

Contributors are: