Concept

Structure of the Full Sequence After a Speculative Decoding Step

The complete output sequence after one step of speculative decoding is composed of three parts: the original context, the accepted draft tokens, and a final token from the verification model. This structure can be represented schematically as: [x,yi]y^i+1...y^i+nayˉi+na+1[\mathbf{x}, \mathbf{y}_{\le i}] \, \hat{y}_{i+1}...\hat{y}_{i+n_a} \, \bar{y}_{i+n_a+1} Here, [x,yi][\mathbf{x}, \mathbf{y}_{\le i}] is the context, which includes the prompt and previously confirmed tokens. This is followed by y^i+1...y^i+na\hat{y}_{i+1}...\hat{y}_{i+n_a}, the sequence of nan_a accepted draft tokens, and is completed by yˉi+na+1\bar{y}_{i+n_a+1}, the single token generated by the verification model.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related