Learn Before
Stabilizing Model Fine-Tuning
During the fine-tuning of a large language model, an engineer observes that the model's outputs are rapidly degrading, becoming nonsensical and repetitive after only a few training steps. Briefly explain how introducing a fixed, pre-trained version of the model as a baseline to compare against during the training process could mitigate this issue.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A team is fine-tuning a large language model (the 'active model') to improve its performance on a specific task. They use the original, pre-trained version of the model as a fixed baseline. During training, a penalty is applied to the active model whenever its output probabilities for generating the next piece of text diverge significantly from the baseline model's probabilities. What is the most likely reason for incorporating this penalty mechanism?
Analysis of Constrained vs. Unconstrained Model Training
Stabilizing Model Fine-Tuning