1Cademy - A machine learning team is developing a compact language model (the student) by training it to learn from a much larger, high-performing model (the teacher). They conduct two experiments with identical student model architectures:<br><br>* **Experiment 1:** The student model is trained solely by minimizing the difference between its final output predictions and the teacher models final output predictions.<br>* **Experiment 2:** In addition to matching the final predictions, the student model is

Learn Before

Multi-level Knowledge Distillation in BERT

Multiple Choice

A machine learning team is developing a compact language model (the 'student') by training it to learn from a much larger, high-performing model (the 'teacher'). They conduct two experiments with identical student model architectures:

Experiment 1: The student model is trained solely by minimizing the difference between its final output predictions and the teacher model's final output predictions.
Experiment 2: In addition to matching the final predictions, the student model is

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related