Learn Before
Evaluating Model Mimicry Performance
Based on the provided outputs, which student model (A or B) is currently a better imitation of the teacher model for this specific input? Justify your reasoning by explaining how the dissimilarity function would likely interpret these distributions.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An engineering team is developing a compact, fast model to replicate the predictions of a much larger, more complex model for a 5-category classification task. They use a specific mathematical function to calculate a 'dissimilarity score' between the probability distributions produced by the two models for each input. A lower score indicates the outputs are more similar. After several training epochs, they observe the average dissimilarity score on a validation dataset has significantly decreased. What is the most accurate interpretation of this observation?
A small, efficient model is being trained to emulate the behavior of a large, powerful model on a 3-category classification task. A mathematical function is used to calculate a 'dissimilarity score' between the probability distributions produced by the two models for a given input, where a higher score indicates a greater difference. For which of the following scenarios would this dissimilarity score be the highest?
Knowledge Distillation Loss using KL Divergence
Evaluating Model Mimicry Performance