Multiple Choice

In a model training setup, a smaller 'student' model is trained to mimic the output probability distribution of a larger 'teacher' model for a given input. The training objective is to minimize the Kullback-Leibler (KL) divergence between the two distributions. The standard loss function is defined as LossA=KL(Teacher DistributionStudent Distribution)Loss_A = \text{KL}(\text{Teacher Distribution} || \text{Student Distribution}). A researcher proposes an alternative loss function, LossB=KL(Student DistributionTeacher Distribution)Loss_B = \text{KL}(\text{Student Distribution} || \text{Teacher Distribution}). How would minimizing LossBLoss_B instead of LossALoss_A most likely change the student model's behavior?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science