Learn Before
Evaluating Cache Eviction Policy Suitability
Evaluate the suitability of the Least Recently Used (LRU) cache eviction policy for both the General-Purpose Chatbot and the Code Completion Assistant described in the case study. Justify your assessment for each service, considering their distinct usage patterns and the goal of minimizing both latency and memory overhead.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An inference system for a large language model uses a cache for text prefixes to speed up processing. The cache has a capacity of 3 slots and uses a Least Recently Used (LRU) eviction policy. The cache is currently full, and its state, from most recently used to least recently used, is as follows:
- Prefix A: "The capital of France is"
- Prefix B: "Translate the following sentence to German:"
- Prefix C: "Once upon a time in a land far away,"
Now, a new user request arrives with the prompt: "The capital of France is Paris." This request is a 'hit' for Prefix A. Immediately after, another request arrives with a new, uncached prefix: "Summarize the main points of the article below:". To store this new prefix, one of the existing prefixes must be evicted. Which prefix will be removed from the cache?
Evaluating Cache Eviction Policy Suitability
An LLM inference system uses a prefix cache with a fixed capacity. The cache is currently full. A new user request arrives with a prefix that is not present in the cache (a 'cache miss'). To make space for this new prefix, the system must evict an existing one based on the Least Recently Used (LRU) policy. Arrange the following actions in the correct chronological order.