1Cademy - Training with Teacher-Generated Outputs as a Distillation Variant

Learn Before

Computational Infeasibility of Full Output Summation in Distillation Loss

Activity (Process)

Training with Teacher-Generated Outputs as a Distillation Variant

To circumvent the computational challenge of summing over all possible outputs, a common variant in knowledge distillation is to train the student model using specific outputs generated by the teacher model. For each training sample, the teacher model produces an output, which then serves as the target for training the student model, avoiding the need to iterate through the entire output space.

Updated 2025-10-10

Contributors are: