1Cademy - An engineer is training a small student model by learning from a larger teacher model. The training objective is to find the student parameters (θ) that maximize a combined score, formulated as: $$ \text{score} = \sum (\text{Term A} - \lambda \cdot \text{Term B}) $$ where Term A measures how well the student predicts the correct, ground-truth answers, and Term B measures how closely the students outputs match the teachers outputs. After training, the engineer notices the student model

Learn Before

Combined Training Objective Formula for Knowledge Distillation

Multiple Choice

An engineer is training a small 'student' model by learning from a larger 'teacher' model. The training objective is to find the student parameters (θ) that maximize a combined score, formulated as: $\text{score} = \sum (\text{Term A} - \lambda \cdot \text{Term B})$ where 'Term A' measures how well the student predicts the correct, ground-truth answers, and 'Term B' measures how closely the student's outputs match the teacher's outputs. After training, the engineer notices the student model

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related