1Cademy - Speculative Decoding Algorithm

Learn Before

Speculative Decoding

Activity (Process)

Speculative Decoding Algorithm

The speculative decoding algorithm accelerates text generation by using a draft model to predict a sequence of future tokens, which are then evaluated by a verification model in parallel. This algorithm consists of four main steps: First, the draft model generates a sequence of $\tau$ candidate tokens given a prefix. Second, the verification model evaluates these predictions simultaneously. Third, the maximum number of consecutively accepted predicted tokens is determined based on their probabilities. Finally, the verification model predicts a new token following the accepted tokens, and this entire process is repeated.