Characteristics of Teacher and Student Models in Knowledge Distillation
In the framework of knowledge distillation, the teacher model is characterized as being more powerful and larger, whereas the student model is intentionally designed to be smaller and more computationally efficient for practical applications.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Formula for Soft Prompt Optimization by Minimizing Prediction Dissimilarity
Optimizing Language Model API Costs
A team is training a set of learnable, continuous parameters to serve as a compact substitute for a long, detailed textual instruction set for a language model. The goal is for these compact parameters to guide the model to produce the same quality of output as the original long instructions when given any user input. Which of the following best describes the core objective of this training process?
Characteristics of Teacher and Student Models in Knowledge Distillation
In the framework of learning a soft prompt via knowledge distillation to compress a longer context, match each component with its corresponding role in the process.
Learn After
Evaluating a Model Deployment Strategy
A company aims to deploy a sophisticated language model on mobile phones, but their current state-of-the-art model is too large and slow for these devices. They plan to train a new, more compact model to mimic the behavior of the larger one. Which statement best analyzes the trade-offs and roles in this knowledge transfer scenario?
Rationale for Model Roles in Knowledge Transfer