Formula

Formula for Next Token Generation After Acceptance in Speculative Decoding

After accepting nan_a consecutive speculated tokens in speculative decoding, the verification model is used to make a new prediction for the token at position i+na+1i + n_a + 1. The new token is selected to maximize the conditional probability according to the verification model's distribution Prp\text{Pr}_p. This is given by the formula: yˉi+na+1=argmaxyi+na+1Prp(yi+na+1x,yi,y^i+1...y^i+na)\bar{y}_{i+n_a+1} = \arg\max_{y_{i+n_a+1}} \text{Pr}_p(y_{i+n_a+1}|\mathbf{x}, \mathbf{y}_{\le i}, \hat{y}_{i+1} ... \hat{y}_{i+n_a}), where the probability is conditioned on the original prefix and the accepted draft tokens.

Image 0

0

1

Updated 2026-05-05

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences