Activity (Process)

Dynamic Adjustment of the Knowledge Distillation Coefficient (λ)

The influence of the teacher model in a combined knowledge distillation objective, which can be applied during either pre-training or fine-tuning, is dynamically controlled by adjusting the interpolation coefficient, λ. A common strategy is to gradually decrease the value of λ as the student model's performance improves. This approach shifts the training focus from mimicking the teacher model towards learning directly from the ground-truth data via the standard language modeling loss.

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course