Learn Before
Activity (Process)

Speculative Decoding Algorithm

The speculative decoding algorithm accelerates text generation by using a draft model to predict a sequence of future tokens, which are then evaluated by a verification model in parallel. This algorithm consists of four main steps: First, the draft model generates a sequence of τ\tau candidate tokens given a prefix. Second, the verification model evaluates these predictions simultaneously. Third, the maximum number of consecutively accepted predicted tokens is determined based on their probabilities. Finally, the verification model predicts a new token following the accepted tokens, and this entire process is repeated.

Image 0

0

1

Updated 2026-05-05

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Learn After