In a speculative decoding process, a draft model q generates a sequence of candidate tokens. The probability distribution for the t-th candidate token in the sequence, y_{i+t}, is conditioned on the original input X, the verified token sequence Y_{≤i}, and one other crucial set of tokens. Complete the formal expression for this conditional probability: Pr_q(y_{i+t} | X, Y_{≤i}, ______).
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Mathematical Formulation of Draft Model Prediction in Speculative Decoding
Imagine a text generation system where a small, fast model first generates a short sequence of candidate tokens (e.g., C1, C2, C3). Then, a large, accurate model checks all these candidates at once. Let's say the system has already produced a confirmed sequence of tokens:
['The', 'cat', 'sat']. The small model has just generated two candidate tokens in the current step:['on', 'the']. What information does the small model use to calculate the probability distribution for the next candidate token (C3)?In a speculative decoding process, a draft model
qgenerates a sequence of candidate tokens. The probability distribution for thet-th candidate token in the sequence,y_{i+t}, is conditioned on the original inputX, the verified token sequenceY_{≤i}, and one other crucial set of tokens. Complete the formal expression for this conditional probability:Pr_q(y_{i+t} | X, Y_{≤i}, ______).Consider a speculative decoding process where a draft model is generating a sequence of three candidate tokens (ŷ₁, ŷ₂, ŷ₃) after a verified prefix. The probability distribution used to select the third token, ŷ₃, is calculated independently of the first two candidate tokens, ŷ₁ and ŷ₂.