1Cademy - Prefix Caching for LLM Inference

Learn Before

Sequence-Level Caching for LLM Inference
Prefilling Phase in Transformer Inference

Concept

Prefix Caching for LLM Inference

An advanced caching technique that extends simpler methods by storing not just full sequences, but also common prefixes and their associated hidden states. This is accomplished by processing an input sequence as in the standard prefilling phase to generate and save the Key-Value (KV) cache states for each prefix. This allows the system to reuse these cached states when a new request shares a prefix with a previously processed sequence, thereby avoiding redundant computation.