Formula

Mathematical Formulation of Draft Model Prediction in Speculative Decoding

In speculative decoding, the draft model prediction phase starts with a given prefix, denoted as [x,yi][\mathbf{x}, \mathbf{y}_{\le i}]. The draft model is used to predict the next τ\tau consecutive tokens, represented as y^i+1,...,y^i+τ\hat{y}_{i+1}, ..., \hat{y}_{i+\tau}. This generation is a token-by-token process where each new token y^i+t\hat{y}_{i+t} is chosen by greedily selecting the one with the highest probability according to the draft model's distribution Prq\text{Pr}_q, conditioned on the prefix and all previously generated draft tokens. This is formally expressed as: y^i+t=argmaxyi+tPrq(yi+tx,yi,y^i+1...y^i+t1)\hat{y}_{i+t} = \arg\max_{y_{i+t}} \text{Pr}_q(y_{i+t}|\mathbf{x}, \mathbf{y}_{\le i}, \hat{y}_{i+1} ... \hat{y}_{i+t-1}).

Image 0

0

1

Updated 2026-05-05

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related