Analyzing Model Training Scenarios
A team is training a compact model to replicate the behavior of a larger, more complex model. The training process is designed to minimize the dissimilarity between the full output probability distributions of the two models for any given input. The team observes the following outputs for the same input image during two different training experiments.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
KL Divergence Loss for Knowledge Distillation
A compact computational model is being trained to replicate the probabilistic outputs of a large, established reference model. The training process aims to minimize the dissimilarity between the two models' full output distributions for any given input. Below are the output probability distributions from the reference model and three potential outputs from the compact model for the same input.
Reference Model Output:
[0.70, 0.20, 0.10]Which of the compact model outputs below demonstrates the most successful replication of the reference model's output distribution, considering the goal is to match the entire distribution, not just the most likely outcome?
Compact Model - Output A:
[0.65, 0.22, 0.13]Compact Model - Output B:[0.70, 0.10, 0.20]Compact Model - Output C:[0.50, 0.30, 0.20]Rationale for Distribution Matching in Model Training
Knowledge Distillation Loss using KL Divergence
Analyzing Model Training Scenarios