1Cademy - Formula for the Number of Consecutively Accepted Tokens in Speculative Decoding

Learn Before

Determining the Maximum Number of Consecutively Accepted Tokens in Speculative Decoding
Role of the Uniformly Distributed Random Variable ( $r_t$ ) in Speculative Decoding

Formula

Formula for the Number of Consecutively Accepted Tokens in Speculative Decoding

The number of consecutively accepted tokens from the start of a speculated sequence, denoted by $n_a$ , is determined by finding the index of the first rejected token. The formula is: $n_a = \min \left\{t-1 \mid 1 \le t \le \tau, r_t > \frac{p(\hat{y}_{i+t})}{q(\hat{y}_{i+t})} \right\}$ Here, $t$ is the index of the token being evaluated (from ${}1$ to $\tau$ ), and $r_t$ is a variable drawn from the uniform distribution $U(0, 1)$ . The formula identifies the minimum index $t$ for which the rejection condition is met, and $t-1$ gives the count of all preceding, consecutively accepted tokens.

Updated 2026-05-05

Contributors are:

Who are from:

References

Learn Before

Related

Learn After