Iterative Process of Speculative Decoding
Speculative decoding operates as a cyclical process. After a set of new tokens—comprising the accepted draft tokens and one final token from the verification model—is generated in a single step, this entire set is appended to the existing context. This updated, longer context is then used as the basis for the next iteration of the process.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
A text generation process uses a fast 'draft' model to propose a sequence of tokens and a more powerful 'verification' model to check them. In one step, the draft model proposes the five-token sequence:
['the', 'quick', 'brown', 'fox', 'jumps']. The verification model accepts the first three tokens ('the','quick','brown') but rejects the fourth token ('fox'). The verification model then generates its own token,'sly'. What is the complete set of new tokens added to the main sequence in this single step?Analysis of a Speculative Generation Step
Iterative Process of Speculative Decoding
In a text generation system using a fast draft model and a more powerful verification model, a single generation step adds the following set of new tokens to the sequence:
{'and', 'the', 'lion'}. Based on the principles of this generation method, which of the following scenarios is the only one that could have produced this specific output?
Learn After
In an iterative speculative decoding process, the current context is represented by the sequence of tokens
{y_1, ..., y_i}. In the current step, two draft tokens are accepted,{ŷ_{i+1}, ŷ_{i+2}}, and the verification model generates one final token,{ȳ_{i+3}}. What will be the complete input context for the next iteration of this process?A single iteration of the speculative decoding process involves several key actions. Arrange the following actions in the correct chronological order to represent one complete cycle, starting from a given context.
Debugging a Speculative Decoding Implementation