Learn Before
Formula

Distillation Loss for Feature-Based Knowledge

The distillation loss for feature-based knowledge transfer matches the feature maps of intermediate layers between teacher and student models. It is calculated as:

LFeaD(ft(x),fs(x))=LF(ϕt(ft(x)),ϕs(fs(x)))L_{FeaD}(f_t(x), f_s(x)) = L_F(\phi_t(f_t(x)), \phi_s(f_s(x)))

Where:

  • ft(x)f_t(x) and fs(x)f_s(x) are the feature maps of the intermediate layers of the teacher and student models, respectively.
  • ϕt()\phi_t(\cdot) and ϕs()\phi_s(\cdot) are transformation functions applied when the feature maps of the two models have different shapes.
  • LF()L_F(\cdot) is the similarity function used for matching the models' feature maps.

0

1

Updated 2026-05-08

Tags

Deep Learning (in Machine learning)

Data Science

Computing Sciences