1Cademy - Teacher-Student Model Architecture in Knowledge Distillation

Learn Before

Key Challenge

Comparison

Teacher-Student Model Architecture in Knowledge Distillation

In a knowledge distillation framework, a larger and more powerful 'teacher' model is used to train a 'student' model that is designed to be smaller and more efficient. The teacher model processes a full-context user input to generate its output probability, denoted as $Prt(y|c, z)$ . In contrast, the student model processes a simplified context input to produce its own output, $Prs(y|c', z)$ . The training objective is to transfer knowledge from the stronger teacher to the compact student by minimizing a loss function that measures the difference between their respective outputs.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

References

Learn Before

Related

Learn After