1Cademy - When implementing response-based knowledge distillation, the loss function is calculated by first applying a softmax function to the teacher and student model outputs to convert them into probability distributions, and then measuring the divergence between these two distributions.

Learn Before

Distillation Loss for Response-Based Knowledge

True/False

When implementing response-based knowledge distillation, the loss function is calculated by first applying a softmax function to the teacher and student model outputs to convert them into probability distributions, and then measuring the divergence between these two distributions.

Updated 2025-10-08

Contributors are: