1Cademy - Rationale for Distribution Matching in Model Training

Learn Before

Using KL Divergence for Knowledge Distillation Loss

Short Answer

Rationale for Distribution Matching in Model Training

A machine learning team is training a small, efficient model to perform a classification task. Instead of training it on the original dataset's 'hard' labels (e.g., 'cat', 'dog'), they train it to replicate the full probability distribution output from a much larger, more accurate model (e.g., 'cat': 90%, 'dog': 8%, 'fox': 2%). Explain why training the small model to match the entire probability distribution is often more beneficial than simply training it to predict the single correct label.

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related