Activity (Process)

Acceptance-Rejection Mechanism for Speculative Decoding

In speculative decoding, a speculated token is accepted or rejected by comparing its probability from a draft model (qq) with that from a target model (pp). If the draft model's probability for a token, q(y^i+t)q(\hat{y}_{i+t}), is greater than the target model's probability, p(y^i+t)p(\hat{y}_{i+t}), the token is rejected with a probability of $1 - \frac{p(\hat{y}{i+t})}{q(\hat{y}{i+t})}.Inthealternativecase,where. In the alternative case, where q(\hat{y}{i+t}) \le p(\hat{y}{i+t})$, the token is accepted outright.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related