Formula

Distillation Loss for Response-Based Knowledge

The distillation loss for response-based knowledge transfer is calculated as the divergence between the logit vectors from the teacher and student models. This can be formally expressed as LResD(zt,zs)=LR(zt,zs)L_{ResD}(z_t, z_s) = L_R(z_t, z_s), where ztz_t and zsz_s are the logits from the teacher and student models, respectively, and LR(.)L_R(.) represents the divergence loss function.

0

1

Updated 2026-05-02

Tags

Deep Learning (in Machine learning)

Data Science

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences