1Cademy - A company implements a caching system for its customer support chatbot. The system stores the full text of a users question as a key and the chatbots complete generated answer as the value. When a new question arrives, the system checks if the exact question text exists in the cache. If it does, the stored answer is returned immediately, bypassing the language model. In which of the following scenarios would this specific caching system be LEAST effective at reducing the overall response time for users?

Learn Before

Sequence-Level Caching for LLM Inference

Multiple Choice

A company implements a caching system for its customer support chatbot. The system stores the full text of a user's question as a key and the chatbot's complete generated answer as the value. When a new question arrives, the system checks if the exact question text exists in the cache. If it does, the stored answer is returned immediately, bypassing the language model. In which of the following scenarios would this specific caching system be LEAST effective at reducing the overall response time for users?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related