1Cademy - Acceptance and Rejection Criteria for Speculated Tokens

Learn Before

Determining the Maximum Number of Consecutively Accepted Tokens in Speculative Decoding

Concept

Acceptance and Rejection Criteria for Speculated Tokens

In speculative decoding, the decision to accept or reject a speculated token $\hat{y}_{i+t}$ depends on the probabilities assigned by the draft model, $q(\hat{y}_{i+t})$ , and the verification model, $p(\hat{y}_{i+t})$ . If $q(\hat{y}_{i+t}) \le p(\hat{y}_{i+t})$ , the speculation is accepted. By contrast, if $q(\hat{y}_{i+t}) > p(\hat{y}_{i+t})$ , the speculation is rejected with a probability of ${}1 - \frac{p(\hat{y}_{i+t})}{q(\hat{y}_{i+t})}$ . This mechanism determines the maximum number of consecutively accepted tokens.

Updated 2026-05-05

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related