A large model is being trained using a combined objective that incorporates signals from both ground-truth data and a smaller 'teacher' model. Based on a typical diagram of this process, arrange the following computational steps into the correct logical order for a single training update.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A large model is being trained using a combined objective. This objective includes a 'distillation loss,' which encourages the large model to mimic the outputs of a smaller, weaker 'teacher' model. It also includes a 'supervised loss,' which is calculated against a set of known correct answers (ground-truth). What is the primary analytical reason for including the 'supervised loss' in this training process?
A large model is being trained using a combined objective that incorporates signals from both ground-truth data and a smaller 'teacher' model. Based on a typical diagram of this process, arrange the following computational steps into the correct logical order for a single training update.
Diagnosing Training Imbalance