Learn Before
A research lab has developed a very large, powerful 'teacher' language model that excels at complex, multi-step reasoning tasks. They want to deploy this reasoning capability in a mobile application, which requires a much smaller, faster 'student' model. Using the principles of knowledge distillation, what would be the most effective training objective for the student model to ensure it learns the reasoning process of the teacher, not just the final answers?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Deploying a Computationally-Intensive Reasoning Model
A research lab has developed a very large, powerful 'teacher' language model that excels at complex, multi-step reasoning tasks. They want to deploy this reasoning capability in a mobile application, which requires a much smaller, faster 'student' model. Using the principles of knowledge distillation, what would be the most effective training objective for the student model to ensure it learns the reasoning process of the teacher, not just the final answers?
Evaluating a Model Compression Strategy for Reasoning