Learn Before
LLM Probability Distribution Notation ()
The notation represents the probability distribution generated by a Large Language Model (LLM). In this formula, stands for probability, the subscript denotes the set of parameters that define the model, and the superscript can indicate a specific sampling strategy or version of the model. The expression as a whole is a function that calculates the probability of a given output (such as a word or token) based on its parameters and some input context.

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
A predictive text model is being trained. At an early stage of training (state t=100), it is given the context c = 'The sky is' and an additional instruction z = 'use a common color'. The model calculates the probability of the next word y = 'blue' as Pr^100('blue' | c, z) = 0.2. After extensive training (state t=5000), the model re-evaluates the same inputs and finds the probability to be Pr^5000('blue' | c, z) = 0.8. What is the most accurate interpretation of this change?
A language model is being prompted to generate a JSON object. The model is at training step 5000. The prompt is: 'Given the user's request to find a coffee shop, provide the output in a structured format.' The model is considering 'name' as the next part of the output. Match each component of the probability expression
Pr^t(y|c, z)to its corresponding element in this scenario.LLM Probability Distribution Notation ()
Analyzing Chatbot Behavior with Conditional Probability
Learn After
A researcher is comparing two language models. Model A is defined by a set of parameters . Model B is a version of Model A that has been fine-tuned on a new dataset, resulting in a new set of parameters, . The researcher wants to compare the probability of each model generating the word 'innovative' given the same input context and using the same sampling strategy, . Which of the following mathematical expressions accurately represents this comparison?
An AI engineer is working with a pre-trained Large Language Model, whose probability distribution is represented by . The engineer decides to change the method used to select the next word from the model's output probabilities, switching from a greedy approach to a top-k sampling approach. The model's underlying weights and biases are not modified. Which component of the notation would need to be updated to reflect this change?
Policy Notation for Autoregressive Models (π_θ)
Analyzing Model Update Notation