Formula

Formula for the Number of Consecutively Accepted Tokens in Speculative Decoding

The number of consecutively accepted tokens from the start of a speculated sequence, denoted by nan_a, is determined by finding the index of the first rejected token. The formula is: na=min{t11tτ,rt>p(y^i+t)q(y^i+t)}n_a = \min \left\{t-1 \mid 1 \le t \le \tau, r_t > \frac{p(\hat{y}_{i+t})}{q(\hat{y}_{i+t})} \right\} Here, tt is the index of the token being evaluated (from 1{}1 to τ\tau), and rtr_t is a variable drawn from the uniform distribution U(0,1)U(0, 1). The formula identifies the minimum index tt for which the rejection condition is met, and t1t-1 gives the count of all preceding, consecutively accepted tokens.

Image 0

0

1

Updated 2026-05-05

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related