Optimizing a Student Model's Training
Based on the scenario, what specific change to the weighting between the two training objectives would you recommend to improve the small model's final performance? Justify your recommendation by explaining how this change would affect the model's learning process over time.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Optimizing a Student Model's Training
An engineer is training a student language model using a combined objective that balances learning from a teacher model's predictions (distillation loss) and learning from the ground-truth data (standard loss). The interpolation coefficient, 位, weighs the teacher's influence. The engineer observes that the student model quickly learns to mimic the teacher's output, but its performance on a validation set eventually plateaus and fails to surpass the teacher's performance, even though the student has the capacity to do better. What is the most probable cause of this issue related to the adjustment of 位?
A student model is being trained using a combined objective that includes a term for learning from a teacher model, weighted by a coefficient 位. Arrange the following training stages in the order that corresponds to a typical and effective dynamic adjustment schedule for 位, from the highest value of 位 to the lowest.