Concept

Cache Eviction Policies for Prefix Caching

To manage the significant memory overhead associated with prefix caching, practical systems employ cache eviction policies. These strategies, such as the least recently used (LRU) method, dictate which cached prefixes should be removed when memory becomes full. The primary objective of these policies is to find an optimal balance between the computational performance gained from caching and the inherent memory constraints of the system.

Image 0

0

1

Updated 2026-05-05

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course

Related