Parameter Updates in Supervised LLM Training
Imagine a language model is being trained on the following example: Input: 'The capital of France is' Target next token: 'Paris'
During this training step, the model's current prediction for the next token gives 'London' a probability of 0.3, 'Paris' a probability of 0.2, and 'Berlin' a probability of 0.1 (with other words having the remaining probability).
Based on the standard objective of maximizing the likelihood of the correct output, describe how the model's internal parameters will be adjusted in response to this specific training example.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is being trained with a supervised objective to maximize the probability of the correct output. Given the input 'The largest city in the US is', the target output is the two-token sequence 'New York'. Two different models are evaluated on this single instance.
- Model A predicts the first token 'New' with a probability of 0.6, and then predicts the second token 'York' with a probability of 0.8.
- Model B predicts the first token 'New' with a probability of 0.9, and then predicts the second token 'York' with a probability of 0.4.
Based on the standard training objective for this task, which statement correctly analyzes the models' performance on this specific example?
Analyzing Model Training with Flawed Data
Limitations of Supervised Fine-Tuning for LLM Alignment
Parameter Updates in Supervised LLM Training