1Cademy - Formula for Updating the Key Matrix in the KV Cache

Learn Before

Updating the KV Cache

Formula

Formula for Updating the Key Matrix in the KV Cache

During autoregressive inference, the Key matrix (K) in the KV cache is expanded at each step. The new key vector, $\mathbf{k}_{i'}$ , corresponding to the current token, is appended to the existing matrix of keys. This update operation is expressed by the formula: $\mathbf{K} = \text{Append}(\mathbf{K}, \mathbf{k}_{i'})$

Updated 2026-05-03

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

In a transformer model generating text, a matrix of 'key' vectors is maintained for all previously generated tokens. Suppose at a certain step, this matrix K contains vectors for two previous tokens and is represented as:

K = [[0.1, 0.5], [0.9, 0.2]]

The model then processes a new token and generates a corresponding new key vector k_new:

k_new = [0.4, 0.8]

Based on the standard procedure for expanding this matrix during text generation, what will the updated matrix K be after incor
A large language model is generating the next token in a sequence. Arrange the following steps in the correct chronological order as they relate to updating the matrix of 'key' vectors for the attention mechanism.
In an autoregressive language model generating a sequence of text, the matrix containing 'key' vectors for previously generated tokens is updated at each step. Consider a scenario where this matrix has been populated with vectors from the first 10 tokens. When the 11th token is processed and its corresponding key vector is generated, the update procedure involves replacing the key vector of the very first token with the new one to keep the matrix size constant.

Learn Before

Related

Learn After