Concept

Reducing KV Cache Complexity via Windowed Caching

The space complexity of the standard Key-Value (KV) cache, which grows linearly with the number of tokens mm as O(Lτdhm)O(L \cdot \tau \cdot d_h \cdot m), can be reduced by caching fewer tokens. For instance, sliding window attention utilizes a fixed-size window mwm_w to store keys and values only for the local context. This restricts the caching mechanism's space complexity to a constant O(Lτdhmw)O(L \cdot \tau \cdot d_h \cdot m_w), making it more manageable regardless of the overall sequence length.

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences