Learn Before
Formula

Distillation Loss for Relation-Based Knowledge

The distillation loss for relation-based knowledge transfer, based on the relations of feature maps, is calculated as:

LRelD(ft,fs)=LR1(ψt(ftˊ,ftˇ),ψs(fsˊ,fsˇ))L_{RelD}(f_t, f_s) = L_{R^1}(\psi_t(\acute{f_t}, \check{f_t}), \psi_s(\acute{f_s}, \check{f_s}))

Where:

  • ftf_t and fsf_s are feature maps of the teacher and student models, respectively.
  • ftˊ\acute{f_t} and ftˇ\check{f_t} are pairs of feature maps chosen from the teacher.
  • fsˊ\acute{f_s} and fsˇ\check{f_s} are pairs of feature maps chosen from the student.
  • ψt()\psi_t(\cdot) and ψs()\psi_s(\cdot) are similarity functions for pairs of feature maps from the models.
  • LR1()L_{R^1}(\cdot) is the correlation function between the teacher and student feature maps.

0

1

Updated 2026-05-10

Tags

Deep Learning (in Machine learning)

Data Science

Computing Sciences