Learn Before
Concept

Distillation Loss

Distillation loss for feature-based learning transfer is

LFeaD(ft(x),fs(x))=LF(ϕt(ft(x)),ϕs(fs(x)))L_{FeaD}(f_t(x), f_s(x)) = L_F(\phi_t(f_t(x)), \phi_s(f_s(x)))

-ft(x)f_t(x) and fs(x)f_s(x) are the feature maps of the intermediate layers of the teacher and student models -ϕs(ft(x))\phi_s(f_t(x)) and ϕs(fs(x))\phi_s(f_s(x)) are applied when feature maps of the two models have different shape -Lf(.)L_f(.) is the similarity function for matching the models’ feature maps

0

1

Updated 2022-10-22

Tags

Deep Learning (in Machine learning)

Data Science

Related