Example

Diagram of Post-Acceptance Token Prediction in Speculative Decoding

This diagram illustrates a step in speculative decoding following the acceptance of draft tokens. Given a context (x, yi), a draft model Pr_q(·) has generated three candidate tokens: ˆy_{i+1}, ˆy_{i+2}, ˆy_{i+3}. After these three tokens are accepted, the evaluation model Pr_p(·) is then used to predict the subsequent token, ¯y_{i+4}. This demonstrates the process of extending the sequence after a successful speculation.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related