Learn Before
Formula
Distillation Loss for Feature-Based Knowledge
The distillation loss for feature-based knowledge transfer matches the feature maps of intermediate layers between teacher and student models. It is calculated as:
Where:
- and are the feature maps of the intermediate layers of the teacher and student models, respectively.
- and are transformation functions applied when the feature maps of the two models have different shapes.
- is the similarity function used for matching the models' feature maps.
0
1
Updated 2026-05-08
Contributors are:
Who are from:
Tags
Deep Learning (in Machine learning)
Data Science
Computing Sciences