1Cademy - Single-Step Generation with a KV Cache

Learn Before

Activity (Process)

Single-Step Generation with a KV Cache

During each step i of autoregressive generation, the model computes a new query ( $q_i$ ), key ( $k_i$ ), and value ( $v_i$ ) vector from the current input token. The new key-value pair ( $k_i, v_i$ ) is appended to the KV Cache, which holds the pairs for all preceding tokens. The attention operation is then performed using the new query $q_i$ and the complete set of keys and values stored in the cache up to the current step, denoted as $K_{\leq i}$ and $V_{\leq i}$ . This process generates the output for step i by allowing the current token to attend to itself and all previous tokens in the sequence.

Updated 2026-05-02

Contributors are:

Who are from:

References

Learn Before

Related

Learn After