Visual Representation of a Speculative Decoding Step's Output
This diagram illustrates the composition of the output sequence after a single step of speculative decoding. The sequence is formed by three distinct parts:
- The initial Context, represented as , which includes the prompt and all previously confirmed tokens.
- A sequence of accepted draft tokens, , which were predicted by the draft model.
- One final token, , which is predicted by the verification model to extend the sequence.

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Related
Post-Acceptance Token Generation in Speculative Decoding
Set of Accepted Draft Tokens
Set of Tokens Generated in a Single Speculative Decoding Step
In a text generation process designed for speed, an initial sequence
['The', 'cat', 'sat']is extended. A fast proposal mechanism suggests the candidate tokens['on', 'the', 'mat']. A more accurate, final-check mechanism then processes these candidates and produces the final, complete sequence:['The', 'cat', 'sat', 'on', 'the', 'rug']. Based on this outcome, how many of the candidate tokens were accepted before the final-check mechanism generated its own token?In a text generation process that uses a fast model to propose candidate tokens and a more accurate main model to check them, a single generation step has just completed. Arrange the following components to correctly represent the structure of the full, updated text sequence.
Visual Representation of a Speculative Decoding Step's Output
Analyzing a Speculative Generation Step