1Cademy - Structure of the Full Sequence After a Speculative Decoding Step

Learn Before

Draft Model in Speculative Decoding
Verification Model in Speculative Decoding
Speculative Decoding Algorithm

Concept

Structure of the Full Sequence After a Speculative Decoding Step

The complete output sequence after one step of speculative decoding is composed of three parts: the original context, the accepted draft tokens, and a final token from the verification model. This structure can be represented schematically as: $[\mathbf{x}, \mathbf{y}_{\le i}] , \hat{y}_{i+1}...\hat{y}_{i+n_a} , \bar{y}_{i+n_a+1}$ Here, $[\mathbf{x}, \mathbf{y}_{\le i}]$ is the context, which includes the prompt and previously confirmed tokens. This is followed by $\hat{y}_{i+1}...\hat{y}_{i+n_a}$ , the sequence of $n_a$ accepted draft tokens, and is completed by $\bar{y}_{i+n_a+1}$ , the single token generated by the verification model.

Updated 2026-05-02

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Post-Acceptance Token Generation in Speculative Decoding
Set of Tokens Generated in a Single Speculative Decoding Step
In a text generation process designed for speed, an initial sequence ['The', 'cat', 'sat'] is extended. A fast proposal mechanism suggests the candidate tokens ['on', 'the', 'mat']. A more accurate, final-check mechanism then processes these candidates and produces the final, complete sequence: ['The', 'cat', 'sat', 'on', 'the', 'rug']. Based on this outcome, how many of the candidate tokens were accepted before the final-check mechanism generated its own token?
In a text generation process that uses a fast model to propose candidate tokens and a more accurate main model to check them, a single generation step has just completed. Arrange the following components to correctly represent the structure of the full, updated text sequence.
Visual Representation of a Speculative Decoding Step's Output
Analyzing a Speculative Generation Step
Set of Accepted Draft Tokens in Speculative Decoding

Learn Before

Related

Learn After