Learn Before
A team is building a system to accelerate text generation from a very large, high-quality, but slow language model. Their strategy involves using a much smaller, faster 'draft' model to propose a sequence of words first. The large model then reviews this draft sequence; if the sequence is plausible, the large model accepts it, saving time. If not, the large model rejects it and generates its own sequence from scratch. To maximize the overall speed of the system (words generated per second), which property is most desirable for the draft model's probability distribution over the next words?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A team is building a system to accelerate text generation from a very large, high-quality, but slow language model. Their strategy involves using a much smaller, faster 'draft' model to propose a sequence of words first. The large model then reviews this draft sequence; if the sequence is plausible, the large model accepts it, saving time. If not, the large model rejects it and generates its own sequence from scratch. To maximize the overall speed of the system (words generated per second), which property is most desirable for the draft model's probability distribution over the next words?
Evaluating Draft Model Effectiveness
Optimizing a Two-Model Generation System