1Cademy - An LLM inference system uses a prefix cache with a fixed capacity. The cache is currently full. A new user request arrives with a prefix that is not present in the cache (a cache miss). To make space for this new prefix, the system must evict an existing one based on the Least Recently Used (LRU) policy. Arrange the following actions in the correct chronological order.

Learn Before

Cache Eviction Policies for Prefix Caching

Sequence Ordering

An LLM inference system uses a prefix cache with a fixed capacity. The cache is currently full. A new user request arrives with a prefix that is not present in the cache (a 'cache miss'). To make space for this new prefix, the system must evict an existing one based on the Least Recently Used (LRU) policy. Arrange the following actions in the correct chronological order.

Updated 2025-10-06

Contributors are:

Who are from:

Learn Before

Related