1Cademy - Post-Acceptance Token Generation in Speculative Decoding

Learn Before

Determining the Maximum Number of Consecutively Accepted Tokens in Speculative Decoding
Structure of the Full Sequence After a Speculative Decoding Step

Activity (Process)

Post-Acceptance Token Generation in Speculative Decoding

Once the number of consecutively accepted draft tokens, $n_a$ , is known, these tokens are added to the final output. The process then continues by using the evaluation model to predict and generate the very next token at position $i + n_a + 1$ , extending the sequence autoregressively from this new point.