Example

Example of Draft Token Generation in Speculative Decoding

To illustrate the draft generation phase, which is the initial step in speculative decoding, consider a scenario where the draft model predicts a sequence of τ=5 candidate tokens. Starting with a given context, such as (x,y<i)(\mathbf{x}, \mathbf{y}_{<i}), the draft model utilizes its probability distribution Pr_q(·) to autoregressively generate a sequence of five tokens, for example, ˆy_{i+1}, ˆy_{i+2}, ˆy_{i+3}, ˆy_{i+4}, ˆy_{i+5}. Each token in this draft sequence is predicted based on the initial context and all previously generated draft tokens within the current step.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences