Learn Before
A team is fine-tuning a large language model (the 'active model') to improve its performance on a specific task. They use the original, pre-trained version of the model as a fixed baseline. During training, a penalty is applied to the active model whenever its output probabilities for generating the next piece of text diverge significantly from the baseline model's probabilities. What is the most likely reason for incorporating this penalty mechanism?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A team is fine-tuning a large language model (the 'active model') to improve its performance on a specific task. They use the original, pre-trained version of the model as a fixed baseline. During training, a penalty is applied to the active model whenever its output probabilities for generating the next piece of text diverge significantly from the baseline model's probabilities. What is the most likely reason for incorporating this penalty mechanism?
Analysis of Constrained vs. Unconstrained Model Training
Stabilizing Model Fine-Tuning