Learn Before
Theory

Benchmark Model in Knowledge Distillation

The benchmark model in knowledge distillation employs a joint loss function that combines the distillation loss and the student loss. The student loss is typically the cross-entropy loss between the ground truth label and the soft logits of the student model, expressed as LCE(y,p(zs,T=1))L_{CE}(y, p(z_s, T = 1)).

0

1

Updated 2026-05-10

Tags

Deep Learning (in Machine learning)

Data Science

Computing Sciences