1Cademy - Evaluation of Draft Tokens by the Verification Model

Learn Before

Speculative Decoding Algorithm

Activity (Process)

Evaluation of Draft Tokens by the Verification Model

In the verification phase of speculative decoding, the larger verification model evaluates the entire sequence of draft tokens, such as $(\hat{y}_{i+1}, ..., \hat{y}_{i+τ})$ , in a single, parallel forward pass. This model, also known as the evaluation model, uses its probability distribution, denoted as $Pr_p(\cdot)$ , to compute the likelihoods for each of the draft tokens. These probabilities are then used in the subsequent acceptance or rejection decision for each token.