Short Answer

Rationale for Model Compression Technique

A machine learning team has a large, high-performing language model that is too slow and resource-intensive for a real-time application. They decide to train a much smaller model from scratch. Instead of training this new, smaller model solely on the original dataset's 'hard labels' (the single correct class), they use the large model to generate 'soft labels' (probability distributions over all possible classes) for the same data and use these as the training target. Explain the primary reason why this approach is often more effective for training the smaller model.

0

1

Updated 2025-10-05

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science