General Formula for Prediction via Maximum Probability
The fundamental principle for making a prediction in many machine learning models is to select the output that has the highest probability given an input. This is formally expressed as: In this formula, is the input, is the set of all possible outputs, is a candidate output from that set, and is the final predicted output. The prediction is chosen because it maximizes the conditional probability .

0
1
References
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Ch.5 Inference - Foundations of Large Language Models
Related
General Formula for Prediction via Maximum Probability
Evaluating University Height Estimation Methods
A data scientist is trying to estimate the typical number of daily active users for a new mobile app. They collect data for the first five days: [100, 150, 120, 180, 1000]. The last day was a special promotional event. They consider two different functions to produce a single-value estimate from this data:
- Function A: Calculates the arithmetic average of the five data points.
- Function B: Identifies the middle value after sorting the data points.
Which of the following statements best analyzes the difference between the estimates produced by these two functions in this specific context?
Consider a function designed to produce a single-number estimate of a population's central value from a sample of data. The function works by taking only the minimum and maximum values from the sample and calculating their average. This function is a robust choice for this task, particularly in situations where the sample might contain extreme outliers.
Using Optimized Predictions as Learning Targets
General Formula for Prediction via Maximum Probability
A language model is given the input 'The ocean is...' and calculates the conditional probability for four candidate words to be the next word. Based on the values below, which word would a model that predicts by maximizing probability choose?
- P("deep" | "The ocean is...") = 0.75
- P("cold" | "The ocean is...") = 0.15
- P("running" | "The ocean is...") = 0.09
- P("quiet" | "The ocean is...") = 0.01
Evaluating a Prediction Strategy
Spam Filter Classification
Inference-Time LLM Alignment
General Formula for Prediction via Maximum Probability
Core Topics in LLM Inference
Historical Context of Inference over Sequential Data
Increased Importance of Inference Efficiency with Longer Sequences
A company deploys a fully trained and aligned language model as a creative writing assistant. When a user provides the prompt, 'The old library held a secret...', the model generates a complete, coherent paragraph to continue the story. Which statement accurately describes the core computational process occurring as the model generates this specific paragraph?
Evaluating a Model Deployment Strategy
A team of developers is creating a new large language model for a customer service chatbot. Below are three major stages of the model's lifecycle. Arrange these stages in the correct chronological order, from initial development to deployment for user interaction.
Computational Challenges of LLM Inference
Learn After
LLM Prediction with Full Context
LLM Prediction with Compressed Context
Mathematical Formulation of Prompt Ensembling
Formula for Scoring Reasoning Paths by Counting Correct Steps
A classification model is given an input,
x, and must choose an output,y, from the set of possible classes {A, B, C, D}. The model's decision rule is to select the class that has the highest conditional probability,Pr(y|x). Given the following probabilities calculated by the model for the inputx, what will its final prediction be?Pr(y=A | x)= 0.15Pr(y=B | x)= 0.55Pr(y=C | x)= 0.25Pr(y=D | x)= 0.05
Model Prediction vs. Ground Truth
Analyzing a Model's Prediction Choice