Short Answer

Rationale for Distribution Matching in Model Training

A machine learning team is training a small, efficient model to perform a classification task. Instead of training it on the original dataset's 'hard' labels (e.g., 'cat', 'dog'), they train it to replicate the full probability distribution output from a much larger, more accurate model (e.g., 'cat': 90%, 'dog': 8%, 'fox': 2%). Explain why training the small model to match the entire probability distribution is often more beneficial than simply training it to predict the single correct label.

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science