Learn Before
Combined Training Objective for Knowledge Distillation
In knowledge distillation, the training objective can be formulated by combining the knowledge distillation loss with the standard language modeling loss. This hybrid approach, which can be implemented during either the pre-training or fine-tuning stages, allows the student model to learn simultaneously from the teacher model's probability distribution and the ground-truth labels from the training data.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Combined Training Objective for Knowledge Distillation
In a model training setup, a smaller 'student' model is trained to mimic the output probability distribution of a larger 'teacher' model for a given input. The training objective is to minimize the Kullback-Leibler (KL) divergence between the two distributions. The standard loss function is defined as . A researcher proposes an alternative loss function, . How would minimizing instead of most likely change the student model's behavior?
Evaluating Student Model Performance
In a knowledge distillation process, a 'teacher' model produces a probability distribution of
[0.8, 0.1, 0.1]over three classes for a given input. Four different 'student' models are being evaluated on the same input, producing the distributions below. Which student model's output distribution is being most effectively guided by the teacher, as measured by the standard Kullback-Leibler (KL) divergence loss function?Adjusting the Distillation Loss Coefficient
Learn After
Combined Training Objective Formula for Knowledge Distillation
Dynamic Adjustment of the Knowledge Distillation Coefficient (Ī»)
Optimizing Student Model Training
When training a smaller 'student' model using a combined objective that learns from both a larger 'teacher' model and the ground-truth data, what is the primary role of the component that learns directly from the ground-truth data?
A student model is being trained using a combined objective that incorporates learning from both a larger 'teacher' model and the ground-truth data. Match each learning source with its primary contribution to the student model's training process.