Learn Before
Context Distillation
Context distillation is a knowledge distillation method designed to adapt large language models (LLMs) to follow simplified instructions. It involves training a student model to make predictions based on user inputs and simplified contexts (such as condensed instructions). This is achieved by transferring knowledge from a well-trained, instruction-following teacher model that processes the original, detailed instructions. The student model learns by minimizing the loss between its predictions and those produced by the teacher model.
0
1
Tags
Foundations of Large Language Models
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Components of a Knowledge Distillation System
Extensions
Applications
KD Workflow
Distilling Prompting Knowledge into Soft Prompts
Efficient Model Deployment for Mobile Applications
A machine learning team is developing a compact model for a mobile application. They have a large, highly accurate 'teacher' model and a smaller 'student' model architecture. Instead of training the student model directly on the original dataset with its ground-truth labels (e.g., 'this image is a cat'), they train it to mimic the full output probability distribution of the teacher model (e.g., '90% cat, 5% dog, 1% tiger...'). Why is this technique often more effective for the student model's performance than training it from scratch on the original labels?
Mechanisms of Knowledge Transfer
Context Distillation