1Cademy - Evaluating Student Model Performance in Knowledge Distillation

Learn Before

KL Divergence Loss for Knowledge Distillation

Case Study

Evaluating Student Model Performance in Knowledge Distillation

Given the scenario below, analyze the output probability distributions. Based on the goal of minimizing the Kullback-Leibler (KL) divergence between the teacher and student models, which student model is better aligned with the teacher's output? Justify your reasoning.

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences