1Cademy - Deconstructing Next-Token Prediction

Learn Before

Conditional Probability of the Next Element in a Sequence

Short Answer

Deconstructing Next-Token Prediction

An autoregressive language model is in the process of generating a response. So far, it has produced the sequence of tokens: ['The', 'cat', 'sat', 'on']. The model now needs to calculate the probability for the next token. Using the formula $Pr_{\hat{\theta}}(x_{i+1}|x_0, ..., x_i)$ , identify what the terms $x_{i+1}$ and $(x_0, ..., x_i)$ represent in this specific scenario.

Updated 2025-10-04

Contributors are:

Who are from:

Learn Before

Related