Learn Before
Concept

Distillation Loss

The distillation loss of relation-based knowledge, based on the relations of feature maps is

  • LRelD(ft,fs)=LR1(ψt(ftˊ,ftˇ),ψs(fsˊ,fsˇ))L_{RelD}(f_t, f_s) = L_{R^1}(\psi_t(\acute{f_t}, \check{f_t}), \psi_s(\acute{f_s}, \check{f_s}))

  • ft,fsf_t, f_s are feature maps of teacher and student models

  • ftˊ,ftˇ\acute{f_t}, \check{f_t} are pairs of feature maps chosen from the teacher

  • fsˊ,fsˇ\acute{f_s}, \check{f_s} are pairs of feature maps chosen from the student

  • ψt(.),ψs(.)\psi_t(.), \psi_s(.) are similarity functions for Pais of feature maps from the models

  • LR1(.)L{R^1}(.) is the correlation function between teacher and student feature maps

0

1

Updated 2022-10-22

Tags

Deep Learning (in Machine learning)

Data Science