1Cademy - Diagram of Post-Acceptance Token Prediction in Speculative Decoding

Learn Before

Determining the Maximum Number of Consecutively Accepted Tokens in Speculative Decoding
Post-Acceptance Token Generation in Speculative Decoding

Example

Diagram of Post-Acceptance Token Prediction in Speculative Decoding

This diagram illustrates a step in speculative decoding following the acceptance of draft tokens. Given a context (x, yi), a draft model Pr_q(·) has generated three candidate tokens: ˆy_{i+1}, ˆy_{i+2}, ˆy_{i+3}. After these three tokens are accepted, the evaluation model Pr_p(·) is then used to predict the subsequent token, ¯y_{i+4}. This demonstrates the process of extending the sequence after a successful speculation.