1Cademy - Reducing KV Cache Complexity via Windowed Caching

Learn Before

Space Complexity of the KV Cache

Concept

Reducing KV Cache Complexity via Windowed Caching

The space complexity of the standard Key-Value (KV) cache, which grows linearly with the number of tokens $m$ as $O(L \cdot \tau \cdot d_h \cdot m)$ , can be reduced by caching fewer tokens. For instance, sliding window attention utilizes a fixed-size window $m_w$ to store keys and values only for the local context. This restricts the caching mechanism's space complexity to a constant $O(L \cdot \tau \cdot d_h \cdot m_w)$ , making it more manageable regardless of the overall sequence length.