Rationale for a Hybrid Training Objective
A team is training a large model using a composite loss function. This function has two parts:
- A component that penalizes the large model when its output differs from a weaker, pre-existing model's output.
- A component that penalizes the large model when its output differs from a small set of human-verified, ground-truth labels.
Analyze the distinct contribution of each of these two components to the overall training process. Why is it beneficial to use both together rather than just one?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Diagnosing a Performance Plateau in Supervised Fine-Tuning
A team is fine-tuning a large language model. They have access to a small, high-quality dataset with verified ground-truth labels, as well as a much larger dataset where labels have been generated by a weaker, smaller model. To maximize the performance of the large model by using both data sources simultaneously, which training objective should they implement?
Visual Diagram of Combined Loss Training for Weak-to-Strong Generalization
Rationale for a Hybrid Training Objective
A research team is fine-tuning a large language model using a combined loss objective, which includes both a standard language model (LM) loss against ground-truth data and a knowledge distillation (KD) loss from a weaker supervisor model. They observe that while the large model is very good at mimicking the style and general structure of the weak supervisor's outputs, it frequently makes factual errors that are not present in the ground-truth dataset. Which of the following is the most likely cause of this issue and the best corrective action?