Learn Before
An engineering team is developing a compact, fast model to replicate the predictions of a much larger, more complex model for a 5-category classification task. They use a specific mathematical function to calculate a 'dissimilarity score' between the probability distributions produced by the two models for each input. A lower score indicates the outputs are more similar. After several training epochs, they observe the average dissimilarity score on a validation dataset has significantly decreased. What is the most accurate interpretation of this observation?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An engineering team is developing a compact, fast model to replicate the predictions of a much larger, more complex model for a 5-category classification task. They use a specific mathematical function to calculate a 'dissimilarity score' between the probability distributions produced by the two models for each input. A lower score indicates the outputs are more similar. After several training epochs, they observe the average dissimilarity score on a validation dataset has significantly decreased. What is the most accurate interpretation of this observation?
A small, efficient model is being trained to emulate the behavior of a large, powerful model on a 3-category classification task. A mathematical function is used to calculate a 'dissimilarity score' between the probability distributions produced by the two models for a given input, where a higher score indicates a greater difference. For which of the following scenarios would this dissimilarity score be the highest?
Knowledge Distillation Loss using KL Divergence
Evaluating Model Mimicry Performance