Formula for Next Token Generation After Acceptance in Speculative Decoding
After accepting consecutive speculated tokens in speculative decoding, the verification model is used to make a new prediction for the token at position . The new token is selected to maximize the conditional probability according to the verification model's distribution . This is given by the formula: , where the probability is conditioned on the original prefix and the accepted draft tokens.

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Formula for Next Token Generation After Acceptance in Speculative Decoding
A text generation system using speculative decoding has the confirmed output 'The cat sat on the'. A draft model then proposes the four-token sequence: 'mat and then slept'. The main verification model evaluates this draft and accepts the first two tokens ('mat', 'and'). What is the correct, immediate next action for the system to take to continue the generation process?
In a single step of a speculative decoding process, after the main model has compared its own probabilities with those of the draft model for a sequence of candidate tokens, what is the correct order of operations to finalize the output for that step?
Diagram of Post-Acceptance Token Prediction in Speculative Decoding
Token Generation After Speculative Acceptance
Learn After
Next Token Selection in an Accelerated Decoding Process
In an accelerated text generation process, a sequence has been extended. The confirmed prefix is 'The cat sat on the', and two subsequent tokens, 'mat and', have just been accepted. The system now needs to generate the very next token. The underlying evaluation model provides the following probabilities for potential next tokens, given the full context 'The cat sat on the mat and':
P('looked') = 0.55 P('slept') = 0.25 P('waited') = 0.15 P('the') = 0.05
According to the principle of selecting the token with the highest probability from the evaluation model's distribution at this step, which token will be chosen next?
In a speculative decoding process, after a sequence of
ndraft tokens has been verified and accepted, the very next token (at positionn+1) is generated by selecting the most likely token according to the draft model's probability distribution.