Formula

Conditional Probability Distribution of the Draft Model in Speculative Decoding

In speculative decoding, the draft model, denoted by q, defines a conditional probability distribution for generating the next token. The probability of any candidate token y_{i+t} is conditioned on the original input X, the sequence of already verified tokens Y_{≤i}, and all previously generated draft tokens in the current step, ŷ_{i+1}...ŷ_{i+t-1}. This distribution is formally expressed as Pr_q(y_{i+t} | X, Y_{≤i}, ŷ_{i+1}...ŷ_{i+t-1}).

Image 0

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related