Fine-Tuning as Maximum Likelihood Estimation
In the context of fine-tuning, a common objective is to adjust the model's parameters, , to maximize the likelihood of observing the true responses given the prompts in a dataset . This is achieved by maximizing the sum of the log-likelihoods for all pairs in the dataset, which is mathematically equivalent to minimizing the negative log-likelihood loss.

0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Fine-Tuning as Maximum Likelihood Estimation
A machine learning engineer is adapting a large, pre-trained language model for a new text classification task. They have a labeled dataset D containing pairs of text inputs (x) and their correct labels (y_gold). The engineer formulates the following objective for the adaptation process, where θ represents the model parameters which are initialized randomly:
What is the primary conceptual error in this formulation for the specific goal of adapting the pre-trained model?
Notation for Parameters in the Fine-Tuning Process
Comparing Optimization Objectives in Model Training
The objective for fine-tuning a pre-trained model is formally expressed as: Match each component of this objective function to its correct description.
Application Formula for Fine-Tuned BERT Models
Maximum Likelihood Estimation for Sequential Data
Fine-Tuning as Maximum Likelihood Estimation
Log-Probability Decomposition for Efficient Multi-Turn Dialogue Training
A language model is being trained on a dataset containing a mix of very short sequences and a few extremely long sequences. A developer observes that the overall training objective, which is the sum of the log-probabilities of all sequences in the dataset, seems to be disproportionately influenced by the model's performance on the few long sequences. Which of the following best explains this observation?
Model Parameter Selection via Likelihood
A language model is being trained on a large dataset of text sequences. After a single parameter update, the model's calculated log-probability for one specific sequence in the dataset increases by 2.5, while the log-probabilities for all other sequences in the dataset remain exactly the same. How does this change affect the overall maximum likelihood training objective for the entire dataset?
Standard Optimization Objective for Transformer Language Models
Learn After
Fine-Tuning Objective as Log-Likelihood Maximization
Training Objective as Joint Log-Likelihood Maximization of Concatenated Sequences
A machine learning engineer is fine-tuning a pre-trained language model on a specialized dataset of question-answer pairs. The chosen training objective is to adjust the model's parameters to maximize the sum of the log-probabilities of the ground-truth answers, conditioned on their corresponding questions. Which statement best analyzes the direct effect of this training objective on the model's behavior?
Interpreting Fine-Tuning Loss
Analyzing Fine-Tuning Behavior
When fine-tuning a language model, the objective of maximizing the sum of the log-likelihoods of the true responses given the prompts is mathematically equivalent to minimizing the mean squared error loss over the dataset.