Learn Before
An engineer is training a compact 'student' model to replicate the behavior of a larger 'teacher' model. The training process aims to minimize a loss function that measures the difference between the output probability distributions of the two models for any given input. If the loss value remains high throughout the training, what is the most direct conclusion?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.3 Prompting - Foundations of Large Language Models
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Deconstructing the Knowledge Transfer Loss Function
An engineer is training a compact 'student' model to replicate the behavior of a larger 'teacher' model. The training process aims to minimize a loss function that measures the difference between the output probability distributions of the two models for any given input. If the loss value remains high throughout the training, what is the most direct conclusion?
Analyzing the Components of a Model Mimicry Loss Function