Draft Model Selection Rationale
An engineer is implementing a system to accelerate text generation from a large language model. They propose using a very small, extremely fast 'draft' model to generate candidate tokens. A colleague argues that a slightly larger, and therefore slower, draft model might actually result in a greater overall speed-up for the entire system. Explain the reasoning that could make the colleague's argument valid, focusing on the key trade-off involved in selecting a draft model.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An engineer is optimizing a text generation system that uses a large, powerful model for final output. To speed up the process, they are testing two different smaller 'draft' models to propose sequences of tokens for the large model to verify.
- Draft Model X: Generates 5 candidate tokens in 10ms. On average, the large model accepts only 1 of these 5 tokens.
- Draft Model Y: Generates 5 candidate tokens in 20ms. On average, the large model accepts 4 of these 5 tokens.
Assuming the verification step by the large model takes a constant amount of time regardless of which draft model is used, which statement best analyzes the likely overall performance of the system?
Optimizing Chatbot Latency
Draft Model Selection Rationale