1Cademy - Formula for Next Token Generation After Acceptance in Speculative Decoding

Learn Before

Post-Acceptance Token Generation in Speculative Decoding

Formula

Formula for Next Token Generation After Acceptance in Speculative Decoding

After accepting $n_a$ consecutive speculated tokens in speculative decoding, the verification model is used to make a new prediction for the token at position $i + n_a + 1$ . The new token is selected to maximize the conditional probability according to the verification model's distribution $\text{Pr}_p$ . This is given by the formula: $\bar{y}_{i+n_a+1} = \arg\max_{y_{i+n_a+1}} \text{Pr}_p(y_{i+n_a+1}|\mathbf{x}, \mathbf{y}_{\le i}, \hat{y}_{i+1} ... \hat{y}_{i+n_a})$ , where the probability is conditioned on the original prefix and the accepted draft tokens.

Updated 2026-05-05

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Next Token Selection in an Accelerated Decoding Process
In an accelerated text generation process, a sequence has been extended. The confirmed prefix is 'The cat sat on the', and two subsequent tokens, 'mat and', have just been accepted. The system now needs to generate the very next token. The underlying evaluation model provides the following probabilities for potential next tokens, given the full context 'The cat sat on the mat and':

P('looked') = 0.55 P('slept') = 0.25 P('waited') = 0.15 P('the') = 0.05

According to the principle of selecting t
In a speculative decoding process, after a sequence of n draft tokens has been verified and accepted, the very next token (at position n+1) is generated by selecting the most likely token according to the draft model's probability distribution.

Learn Before

Related

Learn After