1Cademy - KD Workflow

Learn Before

Knowledge Distillation

Concept

KD Workflow

The teacher learns relationships between activations, neurons, or sample pairs. The logits or parameters of a large deep model are used as the teacher knowledge. Activations, neurons, or features of intermediate layers can also guide learning.

Updated 2022-10-22

Contributors are:

Lois Wong

🏆 2

Who are from:

University of California, Berkeley

🏆 2

References

Knowledge Distillation: A Survey

Tags

Deep Learning (in Machine learning)

Data Science

Components of a Knowledge Distillation System
Extensions
Applications
KD Workflow
Distilling Prompting Knowledge into Soft Prompts
Efficient Model Deployment for Mobile Applications
A machine learning team is developing a compact model for a mobile application. They have a large, highly accurate 'teacher' model and a smaller 'student' model architecture. Instead of training the student model directly on the original dataset with its ground-truth labels (e.g., 'this image is a cat'), they train it to mimic the full output probability distribution of the teacher model (e.g., '90% cat, 5% dog, 1% tiger...'). Why is this technique often more effective for the student model's performance than training it from scratch on the original labels?
Mechanisms of Knowledge Transfer
Context Distillation

Learn After

Key Challenge

Learn Before

Related

Learn After