Learn Before
Conditional Probability of the Next Token
The conditional probability of a token given all its previous context tokens is a fundamental concept in language modeling. It is mathematically denoted as . This probability represents the likelihood of the specific token appearing next in a sequence after the preceding tokens have been observed.

0
1
References
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.2 Generative Models - Foundations of Large Language Models
Related
Chain Rule for Sequence Probability
Conditional Probability of the Next Token
A model is generating a sequence of words. It has already produced the words 'The', 'quick', 'brown'. According to the principle of autoregressive conditional probability, which expression correctly represents the likelihood that the next word will be 'fox', given the preceding words?
Defining Probability for a Token in a Sequence
A model is generating a sequence of elements (x₀, x₁, x₂, x₃, ...). To calculate the probability of the fourth element (x₃), the model's calculation must be conditioned on the entire preceding subsequence (x₀, x₁, x₂). A simplified model that conditions the probability of x₃ only on the immediately preceding element (x₂) would still be correctly applying the principle of autoregressive conditional probability.
Learn After
Schematic of Probability Calculation in Causal Language Modeling
An autoregressive language model is given the sequence of tokens: 'The', 'cat', 'sat', 'on', 'the'. It is now tasked with predicting the very next token. Which of the following expressions correctly represents the primary calculation the model performs to determine the likelihood of the word 'mat' appearing next?
Contextual Influence on Token Probability
Analyzing Contextual Influence on Next-Token Probability
You’re reviewing an internal evaluation script tha...
Your team is building an internal tool that ranks ...
You’re reviewing an internal LLM evaluation pipeli...
Reconciling Training Log-Likelihood with Inference-Time Sequence Selection
Explaining a Counterintuitive Decoding Outcome Using Softmax, Next-Token Conditionals, and Sequence Log-Probability
Diagnosing a “High-Confidence Wrong Token” Bug in Autoregressive Scoring
Investigating a Production Scoring Bug: Softmax Normalization vs. Autoregressive Sequence Log-Probability
Design a Correct Sequence-Scoring Function for Autoregressive LLM Outputs
Root-Cause Analysis: Why a “More Likely” Token-by-Token Completion Loses on Total Sequence Score
Auditing a Candidate Completion Using Softmax Next-Token Probabilities and Autoregressive Log-Probability
Neural Network-Based Next-Token Probability Distribution
Initial Token Probability Assumption