1Cademy - Speculative Decoding

Learn Before

Heuristic Search Algorithms for LLM Inference

Definition

Speculative Decoding

Speculative decoding is an LLM inference technique that accelerates processing by drawing inspiration from speculative execution, where a system predicts and performs future actions in advance. This method employs a small, fast 'draft model' to generate a sequence of candidate tokens. These tokens are then evaluated in a single parallel step by a larger, more accurate 'verification model'. If the draft model's predictions are correct, they are accepted. If they are incorrect, the invalid tokens are discarded, and the verification model generates the correct token(s) to continue the sequence.

Updated 2026-05-05

Contributors are: