Learn Before
Knowledge Distillation
Knowledge distillation is a type of model compression and acceleration technique. It counteracts the challenge of deploying efficient deep learning models on devices with limited resources (e.g. mobile devices and embedded systems) due to computational complexity and storage requirements.
0
1
Contributors are:
Who are from:
Tags
Deep Learning (in Machine learning)
Data Science
Foundations of Large Language Models Course
Computing Sciences
Foundations of Large Language Models
Ch.3 Prompting - Foundations of Large Language Models
Learn After
Components of a Knowledge Distillation System
Extensions
Applications
KD Workflow
Distilling Prompting Knowledge into Soft Prompts
Efficient Model Deployment for Mobile Applications
A machine learning team is developing a compact model for a mobile application. They have a large, highly accurate 'teacher' model and a smaller 'student' model architecture. Instead of training the student model directly on the original dataset with its ground-truth labels (e.g., 'this image is a cat'), they train it to mimic the full output probability distribution of the teacher model (e.g., '90% cat, 5% dog, 1% tiger...'). Why is this technique often more effective for the student model's performance than training it from scratch on the original labels?
Mechanisms of Knowledge Transfer
Context Distillation