Activity (Process)

Evaluation of Draft Tokens by the Verification Model

In the verification phase of speculative decoding, the larger verification model evaluates the entire sequence of draft tokens, such as (y^i+1,...,y^i+τ)(\hat{y}_{i+1}, ..., \hat{y}_{i+τ}), in a single, parallel forward pass. This model, also known as the evaluation model, uses its probability distribution, denoted as Prp()Pr_p(\cdot), to compute the likelihoods for each of the draft tokens. These probabilities are then used in the subsequent acceptance or rejection decision for each token.

Image 0

0

1

Updated 2026-05-05

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related