Learn Before
Formula
Target-Generated Output Loss for Context Distillation
To overcome the computational infeasibility of the sequence-level loss, a variant of context distillation trains the student model using specific outputs generated by the teacher model. For each sample, the teacher model produces an output , which is then considered the target for learning. The simplified loss function becomes:
0
1
Updated 2026-04-30
Tags
Foundations of Large Language Models
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences