Concept

Prefix Caching for LLM Inference

An advanced caching technique that extends simpler methods by storing not just full sequences, but also common prefixes and their associated hidden states. This is accomplished by processing an input sequence as in the standard prefilling phase to generate and save the Key-Value (KV) cache states for each prefix. This allows the system to reuse these cached states when a new request shares a prefix with a previously processed sequence, thereby avoiding redundant computation.

Image 0

0

1

Updated 2026-05-05

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
Learn After