Concept

Using KL Divergence for Knowledge Distillation Loss

An alternative approach to knowledge distillation loss involves directly minimizing the discrepancy between the output probability distributions of the teacher and student models. The Kullback-Leibler (KL) divergence is a common metric used to formulate this loss function, quantifying the 'distance' between the two distributions.

Image 0

0

1

Updated 2026-01-15

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences