Learn Before
Mechanisms of Knowledge Transfer
In the context of training a smaller 'student' model from a larger 'teacher' model, describe two distinct types of 'knowledge' that can be transferred from the teacher to the student, beyond just the final predictions. For each type, briefly explain how it helps the student model learn more effectively.
0
1
Tags
Deep Learning (in Machine learning)
Data Science
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Components of a Knowledge Distillation System
Extensions
Applications
KD Workflow
Distilling Prompting Knowledge into Soft Prompts
Efficient Model Deployment for Mobile Applications
A machine learning team is developing a compact model for a mobile application. They have a large, highly accurate 'teacher' model and a smaller 'student' model architecture. Instead of training the student model directly on the original dataset with its ground-truth labels (e.g., 'this image is a cat'), they train it to mimic the full output probability distribution of the teacher model (e.g., '90% cat, 5% dog, 1% tiger...'). Why is this technique often more effective for the student model's performance than training it from scratch on the original labels?
Mechanisms of Knowledge Transfer
Context Distillation