Learn Before
A company implements a caching system for its customer support chatbot. The system stores the full text of a user's question as a key and the chatbot's complete generated answer as the value. When a new question arrives, the system checks if the exact question text exists in the cache. If it does, the stored answer is returned immediately, bypassing the language model. In which of the following scenarios would this specific caching system be LEAST effective at reducing the overall response time for users?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Prefix Caching for LLM Inference
A company implements a caching system for its customer support chatbot. The system stores the full text of a user's question as a key and the chatbot's complete generated answer as the value. When a new question arrives, the system checks if the exact question text exists in the cache. If it does, the stored answer is returned immediately, bypassing the language model. In which of the following scenarios would this specific caching system be LEAST effective at reducing the overall response time for users?
Evaluating a Caching Strategy for an FAQ Chatbot
Trade-offs in Sequence-Level Caching