Learn Before
Function to Measure Differences Between Models
In architectures that utilize both small and large models, a mathematical function is often defined to quantify the difference between their outputs. For example, in knowledge distillation, the Kullback-Leibler (KL) divergence can be used to measure the dissimilarity between the probability distributions generated by a large 'teacher' model and a smaller 'student' model for a given input.

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Visual Diagram of Cascading Inference
Function to Measure Differences Between Models
Analysis of a Hybrid AI System for Customer Support
A company is implementing a system where user queries are first processed by a small, fast model. If the initial result does not meet a certain quality threshold, the query is then passed to a larger, more accurate model. What is the most critical trade-off the company must consider when setting this quality threshold?
Impact of Small Model Improvement
Learn After
An engineering team is developing a compact, fast model to replicate the predictions of a much larger, more complex model for a 5-category classification task. They use a specific mathematical function to calculate a 'dissimilarity score' between the probability distributions produced by the two models for each input. A lower score indicates the outputs are more similar. After several training epochs, they observe the average dissimilarity score on a validation dataset has significantly decreased. What is the most accurate interpretation of this observation?
A small, efficient model is being trained to emulate the behavior of a large, powerful model on a 3-category classification task. A mathematical function is used to calculate a 'dissimilarity score' between the probability distributions produced by the two models for a given input, where a higher score indicates a greater difference. For which of the following scenarios would this dissimilarity score be the highest?
Knowledge Distillation Loss using KL Divergence
Evaluating Model Mimicry Performance