1Cademy - Distillation Loss for Response-Based Knowledge

Learn Before

Response based knowledge
Teacher-Student Model Architecture in Knowledge Distillation

Formula

Distillation Loss for Response-Based Knowledge

The distillation loss for response-based knowledge transfer is calculated as the divergence between the logit vectors from the teacher and student models. This can be formally expressed as $L_{ResD}(z_t, z_s) = L_R(z_t, z_s)$ , where $z_t$ and $z_s$ are the logits from the teacher and student models, respectively, and $L_R(.)$ represents the divergence loss function.