1Cademy - Mathematical Formulation of Draft Model Prediction in Speculative Decoding

Learn Before

Formula

Mathematical Formulation of Draft Model Prediction in Speculative Decoding

In speculative decoding, the draft model prediction phase starts with a given prefix, denoted as $[\mathbf{x}, \mathbf{y}_{\le i}]$ . The draft model is used to predict the next $\tau$ consecutive tokens, represented as $\hat{y}_{i+1}, ..., \hat{y}_{i+\tau}$ . This generation is a token-by-token process where each new token $\hat{y}_{i+t}$ is chosen by greedily selecting the one with the highest probability according to the draft model's distribution $\text{Pr}_q$ , conditioned on the prefix and all previously generated draft tokens. This is formally expressed as: $\hat{y}_{i+t} = \arg\max_{y_{i+t}} \text{Pr}_q(y_{i+t}|\mathbf{x}, \mathbf{y}_{\le i}, \hat{y}_{i+1} ... \hat{y}_{i+t-1})$ .

Updated 2026-05-05

Contributors are:

Who are from:

References

Learn Before

Related

Learn After