Activity (Process)

Training with Teacher-Generated Outputs as a Distillation Variant

To circumvent the computational challenge of summing over all possible outputs, a common variant in knowledge distillation is to train the student model using specific outputs generated by the teacher model. For each training sample, the teacher model produces an output, which then serves as the target for training the student model, avoiding the need to iterate through the entire output space.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences