1Cademy - Combined Training Objective for Knowledge Distillation

Learn Before

Knowledge Distillation Loss using KL Divergence

Concept

Combined Training Objective for Knowledge Distillation

In knowledge distillation, the training objective can be formulated by combining the knowledge distillation loss with the standard language modeling loss. This hybrid approach, which can be implemented during either the pre-training or fine-tuning stages, allows the student model to learn simultaneously from the teacher model's probability distribution and the ground-truth labels from the training data.

Updated 2025-10-06

Contributors are: