A compact computational model is being trained to replicate the probabilistic outputs of a large, established reference model. The training process aims to minimize the dissimilarity between the two models' full output distributions for any given input. Below are the output probability distributions from the reference model and three potential outputs from the compact model for the same input.
Reference Model Output: [0.70, 0.20, 0.10]
Which of the compact model outputs below demonstrates the most successful replication of the reference model's output distribution, considering the goal is to match the entire distribution, not just the most likely outcome?
Compact Model - Output A: [0.65, 0.22, 0.13]
Compact Model - Output B: [0.70, 0.10, 0.20]
Compact Model - Output C: [0.50, 0.30, 0.20]
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
KL Divergence Loss for Knowledge Distillation
A compact computational model is being trained to replicate the probabilistic outputs of a large, established reference model. The training process aims to minimize the dissimilarity between the two models' full output distributions for any given input. Below are the output probability distributions from the reference model and three potential outputs from the compact model for the same input.
Reference Model Output:
[0.70, 0.20, 0.10]Which of the compact model outputs below demonstrates the most successful replication of the reference model's output distribution, considering the goal is to match the entire distribution, not just the most likely outcome?
Compact Model - Output A:
[0.65, 0.22, 0.13]Compact Model - Output B:[0.70, 0.10, 0.20]Compact Model - Output C:[0.50, 0.30, 0.20]Rationale for Distribution Matching in Model Training
Knowledge Distillation Loss using KL Divergence
Analyzing Model Training Scenarios