Concept

Combined Training Objective for Knowledge Distillation

In knowledge distillation, the training objective can be formulated by combining the knowledge distillation loss with the standard language modeling loss. This hybrid approach, which can be implemented during either the pre-training or fine-tuning stages, allows the student model to learn simultaneously from the teacher model's probability distribution and the ground-truth labels from the training data.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences