Learn Before
Formula
Context Distillation Loss Function
Knowledge distillation in the context distillation method is performed by minimizing a loss function defined on the outputs of the teacher and student models:
where denotes the pre-trained teacher model, and denotes the student model with the parameters .
0
1
Updated 2026-04-30
Tags
Foundations of Large Language Models
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences