Definition

Draft Model in Speculative Decoding

The draft model in speculative decoding is a smaller, faster language model that generates candidate tokens using a standard autoregressive process. Its key characteristic is high efficiency, which allows it to produce a sequence of tokens quickly. Although it is less accurate than the main model, its function is to provide plausible future tokens that can be rapidly verified, acting as a fast but potentially imperfect predictor.

0

1

Updated 2026-05-05

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences