Maximum Likelihood Estimation (MLE) as the Objective for Supervised Fine-Tuning
In Supervised Fine-Tuning (SFT), the training objective is to maximize the probability of the model generating a "gold-standard" or ground-truth output () given a specific input (). This is achieved through Maximum Likelihood Estimation (MLE), where the model's parameters are adjusted to make its predicted token distributions align as closely as possible with the one-hot encoded distributions of the correct response. The formal objective is to maximize the conditional probability: .
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Maximum Likelihood Estimation (MLE) as the Objective for Supervised Fine-Tuning
A development team is fine-tuning a pre-trained language model using a curated dataset of customer support inquiries (inputs) and their corresponding ideal, human-written responses (outputs). The aim is to create a specialized chatbot that reliably provides answers in the same helpful and accurate style as the examples. From a probabilistic perspective, which statement best describes the fundamental objective of this training process?
Correcting a Flawed Fine-Tuning Objective
Objective for a Specialized Math Tutor
Mathematical Formulation of the Supervised Fine-Tuning Objective
Conditional vs. Joint Probability Objectives in Language Modeling
Instruction Fine-Tuning
Potential for Undesirable Content Generation After SFT
Example of SFT: Question-Answering Task
Applicability of Supervised Fine-Tuning
Practical Implementation Challenges of SFT
Maximum Likelihood Estimation (MLE) as the Objective for Supervised Fine-Tuning
Instruction Fine-Tuning as a Technique of SFT
Size and Specialization of SFT Datasets
Generalization as an Outcome of SFT
Characteristics of SFT Datasets
Generalization from Supervised Fine-Tuning
Definition of SFT Datasets
A development team starts with a base language model that has been pre-trained on a massive, general-purpose dataset from the web. To make the model a specialized customer service chatbot, the team initiates a second phase of training. How would the dataset used in this second phase most likely differ from the original pre-training dataset?
Comparison of SFT and Pre-training Datasets
SFT as a Post-Training Phase
Adapting a Model for a New Task
A law firm wants to develop a language model that can take a lengthy legal contract as input and produce a concise, one-paragraph summary highlighting key clauses like the term, liability limits, and governing law. They have a team of paralegals available to create a high-quality dataset of several thousand contract-summary pairs. Which of the following approaches is the most effective and direct way to train the model for this specific task?
Learn After
A language model is being fine-tuned on a dataset of instruction-response pairs. Consider the following training example:
- Input:
What is the capital of France? - Correct Response:
Paris
The model processes the input and must predict the first token of the response. Below are two potential probability distributions (States A and B) that the model could generate for this first token at different points during training.
- State A:
{'Paris': 0.15, 'London': 0.10, 'The': 0.08, ...} - State B:
{'Paris': 0.25, 'London': 0.05, 'The': 0.04, ...}
Based on the standard objective for this type of training, which statement provides the most accurate analysis?
- Input:
A language model is being fine-tuned on a dataset. For the input
Translate to French: I love to learn., the correct starting token for the response isJ'. At a particular step in training, the model produces the following probabilities for the first token:Je: 0.35J': 0.25Le: 0.15Mon: 0.10- (all other tokens): 0.15
Given that the training objective is to maximize the likelihood of the correct sequence, how will the training process adjust the model's parameters in the next immediate step for this specific token prediction?
Analyzing Model Behavior Under Maximum Likelihood Estimation