Formula

Space Complexity of Sliding Window Attention

In sliding window attention, the space complexity of the Key-Value (KV) cache is reduced by storing keys and values for only a fixed-size window of recent tokens (mwm_w), rather than the entire sequence. This approach results in a constant memory footprint with respect to the sequence length, defined by the formula O(Lτdhmw)O(L \cdot \tau \cdot d_h \cdot m_w), where LL is the number of layers, τ\tau is the number of attention heads, and dhd_h is the head dimension.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related