Learn Before
Essay

Choosing an Optimal Caching Strategy

An engineering team is building a large language model-based application and can implement one of two caching strategies to reduce computational load.

Strategy 1: Store the final, complete answer for frequently asked, identical prompts. If an incoming prompt is an exact match to a stored prompt, the saved answer is returned instantly.

Strategy 2: Store the intermediate computational state (key-value pairs) generated from the initial phrases of prompts. If an incoming prompt starts with a phrase that has been processed before, the system can load the saved state and resume computation from that point.

Consider two potential use cases for the application:

Use Case A: A customer service bot that primarily answers a list of 50 specific, unchanging frequently asked questions (e.g., 'What are your store hours?', 'What is the return policy?').

Use Case B: A code generation assistant where users often start prompts with similar instructions (e.g., 'Write a Python function that...', 'In Javascript, create a class for...') but the remainder of the prompt is highly variable and unique.

Which use case would derive significantly more benefit from Strategy 2? Justify your answer by analyzing the nature of the prompts in each use case and explaining how they align with the mechanics of the described caching strategies.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Related