1Cademy - A company is deploying a large language model for a new application. They implement a performance-enhancing feature that saves a users exact input prompt and the models complete generated output as a key-value pair. When a new prompt is received, the system first checks if it exactly matches a saved prompt. If a match is found, it returns the saved output directly, avoiding a new model computation. In which of the following scenarios would this specific optimization strategy be LEAST effective?

Learn Before

Request-Response Caching for LLM Inference

Multiple Choice

A company is deploying a large language model for a new application. They implement a performance-enhancing feature that saves a user's exact input prompt and the model's complete generated output as a key-value pair. When a new prompt is received, the system first checks if it exactly matches a saved prompt. If a match is found, it returns the saved output directly, avoiding a new model computation. In which of the following scenarios would this specific optimization strategy be LEAST effective?

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related