1Cademy - Modeling and Efficient Computation of Conditional Token Probabilities

Approach 1: Uses a fixed-size window, considering only the k most recent previous tokens ( y_{i-k}, ..., y_{i-1} ) to predict the next token yi .
Approach 2: Processes the entire preceding sequence ( y_{&lt;i} ) to predict the next token yi .

Learn Before

Auto-regressive Decomposition of Conditional Log-Likelihood
Historical Context and Computational Challenges of Maximum Probability Prediction

Activity (Process)

Modeling and Efficient Computation of Conditional Token Probabilities

A crucial aspect of implementing autoregressive language models involves two interconnected tasks: first, defining a model for the conditional probability of the next token, Pr(yi|x, y_{<i}), and second, ensuring that this probability can be calculated in a computationally efficient way.

Updated 2025-10-10

Contributors are: