Example

Visual Representation of a Speculative Decoding Step's Output

This diagram illustrates the composition of the output sequence after a single step of speculative decoding. The sequence is formed by three distinct parts:

  1. The initial Context, represented as [x,y<i][\mathbf{x}, \mathbf{y}_{<i}], which includes the prompt and all previously confirmed tokens.
  2. A sequence of nan_a accepted draft tokens, y^i+1y^i+na\hat{y}_{i+1} \dots \hat{y}_{i+n_a}, which were predicted by the draft model.
  3. One final token, yˉi+na+1\bar{y}_{i+n_a+1}, which is predicted by the verification model to extend the sequence.
Image 0

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences