1Cademy - Conditional Probability Distribution of the Draft Model in Speculative Decoding

Learn Before

Speculative Decoding Algorithm

Formula

Conditional Probability Distribution of the Draft Model in Speculative Decoding

In speculative decoding, the draft model, denoted by q, defines a conditional probability distribution for generating the next token. The probability of any candidate token y_{i+t} is conditioned on the original input X, the sequence of already verified tokens Y_{≤i}, and all previously generated draft tokens in the current step, ŷ_{i+1}...ŷ_{i+t-1}. This distribution is formally expressed as Pr_q(y_{i+t} | X, Y_{≤i}, ŷ_{i+1}...ŷ_{i+t-1}).