1Cademy - Draft Model Probability Distribution ($$Pr

Learn Before

Information

Definition

Draft Model Probability Distribution ( $Pr_q(\cdot)$ )

A draft model is a smaller, computationally less expensive model used to generate candidate sequences or tokens. The probability distribution $Pr_q(\cdot)$ represents the likelihood of generating a specific output according to this draft model. This approach is often used in techniques like speculative decoding to accelerate inference in larger, more powerful models by having the large model only verify the draft model's predictions rather than generating tokens from scratch.

Updated 2026-06-23

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn After

A team is building a system to accelerate text generation from a very large, high-quality, but slow language model. Their strategy involves using a much smaller, faster 'draft' model to propose a sequence of words first. The large model then reviews this draft sequence; if the sequence is plausible, the large model accepts it, saving time. If not, the large model rejects it and generates its own sequence from scratch. To maximize the overall speed of the system (words generated per second), whic
Evaluating Draft Model Effectiveness
Optimizing a Two-Model Generation System

Learn Before

Related

Learn After