1Cademy - LLM Probability Distribution Notation ($Pr

Learn Before

Conditional Probability Pr^t(y|c, z)

Formula

LLM Probability Distribution Notation ( $Pr_\theta^s(\cdot)$ )

The notation $Pr_\theta^s(\cdot)$ represents the probability distribution generated by a Large Language Model (LLM). In this formula, $Pr$ stands for probability, the subscript $\theta$ denotes the set of parameters that define the model, and the superscript $s$ can indicate a specific sampling strategy or version of the model. The expression as a whole is a function that calculates the probability of a given output (such as a word or token) based on its parameters and some input context.

Updated 2025-10-08

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

A researcher is comparing two language models. Model A is defined by a set of parameters $\theta_A$ . Model B is a version of Model A that has been fine-tuned on a new dataset, resulting in a new set of parameters, $\theta_B$ . The researcher wants to compare the probability of each model generating the word 'innovative' given the same input context and using the same sampling strategy, $s$ . Which of the following mathematical expressions accurately represents this comparison?
An AI engineer is working with a pre-trained Large Language Model, whose probability distribution is represented by $Pr_\theta^s(\cdot)$ . The engineer decides to change the method used to select the next word from the model's output probabilities, switching from a greedy approach to a top-k sampling approach. The model's underlying weights and biases are not modified. Which component of the notation $Pr_\theta^s(\cdot)$ would need to be updated to reflect this change?
Analyzing Model Update Notation
Policy Notation for Autoregressive Models ( $\pi_\theta$ )

Learn Before

Related

Learn After