Learn Before
Maximum Likelihood Estimation (MLE) Objective in Supervised Language Model Training
In standard supervised training, the objective for a Large Language Model is to maximize the probability of generating a correct 'gold-standard' output sequence, , given an input, . This is achieved through Maximum Likelihood Estimation (MLE), where the model, which produces a series of token distributions, is trained to align these predictions with the one-hot distributions representing the target sequence. The formal objective is to maximize the conditional probability .
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Analogy to NLP Data Augmentation in Synthetic Data Generation
Limitation of Relying on Human-Crafted Inputs for Synthetic Data Generation
Proven Utility of Synthetic Data in Well-Tuned LLMs
Generating Fine-Tuning Data with Crowdsourced Questions and LLM-Generated Answers
Using a Well-Tuned LLM to Generate Fine-Tuning Data for a New LLM
Maximum Likelihood Estimation (MLE) Objective in Supervised Language Model Training
Data Generation Strategy for a Specialized AI Assistant
Generating Synthetic Data with a Weak LLM for Instruction Fine-Tuning
A small research lab with a limited budget aims to fine-tune a language model for a specialized task: summarizing complex legal documents. They need a large dataset of 'legal text' and 'corresponding summary' pairs. Considering their resource constraints, which of the following is the most efficient and scalable strategy for creating this dataset?
Evaluating Data Generation Strategies
Learn After
A language model is being trained with a supervised objective to maximize the probability of the correct output. Given the input 'The largest city in the US is', the target output is the two-token sequence 'New York'. Two different models are evaluated on this single instance.
- Model A predicts the first token 'New' with a probability of 0.6, and then predicts the second token 'York' with a probability of 0.8.
- Model B predicts the first token 'New' with a probability of 0.9, and then predicts the second token 'York' with a probability of 0.4.
Based on the standard training objective for this task, which statement correctly analyzes the models' performance on this specific example?
Analyzing Model Training with Flawed Data
Limitations of Supervised Fine-Tuning for LLM Alignment
Parameter Updates in Supervised LLM Training