Example

Visual Representation of the Verification Phase in Speculative Decoding

This diagram illustrates the verification phase of speculative decoding. After a draft model has generated a sequence of candidate tokens (in this case, five tokens denoted \hat{y}_{i+1} through \hat{y}_{i+5}), the larger and more accurate evaluation model, represented as Pr_p(·), evaluates the entire sequence. This evaluation happens in a single, parallel forward pass, as indicated by the arrow, and is a crucial step before the tokens are either accepted or rejected.

Image 0

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences