Concept

Non-Contiguous Memory Allocation in PagedAttention

The core mechanism of PagedAttention involves partitioning the KV cache into fixed-size blocks, analogous to memory pages. These blocks can then be stored in non-contiguous locations within the physical memory, which eliminates the need to find and reserve a single, large, continuous memory space for each sequence's cache.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related