Acceptance-Rejection Mechanism for Speculative Decoding
In speculative decoding, a speculated token is accepted or rejected by comparing its probability from a draft model () with that from a target model (). If the draft model's probability for a token, , is greater than the target model's probability, , the token is rejected with a probability of $1 - \frac{p(\hat{y}{i+t})}{q(\hat{y}{i+t})}q(\hat{y}{i+t}) \le p(\hat{y}{i+t})$, the token is accepted outright.

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Acceptance-Rejection Mechanism for Speculative Decoding
Inequality Constraint for Predicted Future Value Functions ()
Condition for Rejecting Speculation
Consider a text generation system that uses a fast, approximate model to propose a potential future word. For each proposed word, a more accurate but slower model also calculates a probability. Suppose at a certain step
i, the fast model predicts the next word will be 'universe' (represented as ). The fast model's confidence in this specific prediction is calculated and denoted as . Based on this information, what is the most accurate interpretation of the value 0.8?In the context of a system that generates sequences of values (like words in a sentence), the expression is often used. Match each component of this expression to its correct description.
Rejection Criterion in Speculative Sampling
Interpreting Model Predictions
Acceptance-Rejection Mechanism for Speculative Decoding
Evaluation Model for Predicted Sequences
Inequality Constraint for Predicted Future Value Functions ()
A text-generation model has produced the sequence 'The quick brown fox'. The model is now at the 4th position (after 'fox') and is predicting the next word in the sequence. It predicts the word 'jumps' will appear at the 5th position. Which of the following expressions correctly represents the probability assigned by the model to this specific prediction?
In the context of a model generating a sequence of values, match each component of the notation
p(ŷ_{i+t})to its correct description.Rejection Criterion in Speculative Sampling
Interpreting a Weather Forecasting Model's Output
Learn After
Determining the Maximum Number of Consecutively Accepted Tokens in Speculative Decoding
Role of the Uniformly Distributed Random Variable () in Speculative Decoding
In a text generation process, a small, fast model proposes the next token as 'learning' with a probability of 0.8. A larger, more accurate model then evaluates this same token and assigns it a probability of 0.6. Based on the standard acceptance-rejection procedure used in this context, what is the outcome for the token 'learning'?
Evaluating Proposed Tokens in a Generation Process
In a text generation process that uses a draft model and a target model, if the draft model assigns a higher probability to a proposed token than the target model does, that token is automatically rejected.