1Cademy - Using KL Divergence for Knowledge Distillation Loss

Learn Before

Objective Function for Student Model Training via Knowledge Distillation

Concept

Using KL Divergence for Knowledge Distillation Loss

An alternative approach to knowledge distillation loss involves directly minimizing the discrepancy between the output probability distributions of the teacher and student models. The Kullback-Leibler (KL) divergence is a common metric used to formulate this loss function, quantifying the 'distance' between the two distributions.