Concept

Acceptance and Rejection Criteria for Speculated Tokens

In speculative decoding, the decision to accept or reject a speculated token y^i+t\hat{y}_{i+t} depends on the probabilities assigned by the draft model, q(y^i+t)q(\hat{y}_{i+t}), and the verification model, p(y^i+t)p(\hat{y}_{i+t}). If q(y^i+t)p(y^i+t)q(\hat{y}_{i+t}) \le p(\hat{y}_{i+t}), the speculation is accepted. By contrast, if q(y^i+t)>p(y^i+t)q(\hat{y}_{i+t}) > p(\hat{y}_{i+t}), the speculation is rejected with a probability of 1p(y^i+t)q(y^i+t){}1 - \frac{p(\hat{y}_{i+t})}{q(\hat{y}_{i+t})}. This mechanism determines the maximum number of consecutively accepted tokens.

0

1

Updated 2026-05-05

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related