Learn Before
Evaluating a Loss Function for a Machine Translation Task
Based on the structure of the provided loss function and the context of the task, evaluate the practical feasibility of the engineer's proposed training approach. Justify your conclusion by referencing the specific component of the formula that presents the main challenge.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Computational Infeasibility of Full Output Summation in Distillation Loss
A student model is trained to mimic a teacher model by minimizing the following loss function, which measures the dissimilarity between their output probability distributions for a given input:
In this formula, is the teacher's probability for an output sequence , is the student's probability, and the summation is over all possible output sequences. What is the primary function of the summation () over the entire space of possible outputs?
Evaluating a Loss Function for a Machine Translation Task
A student model is being trained to replicate the output distribution of a teacher model using the loss function:
Suppose for a given input, there are only three possible output sequences: A, B, and C. The teacher model assigns the following probabilities:
Pr^t(A) = 0.8Pr^t(B) = 0.15Pr^t(C) = 0.05
Two different student models produce the following distributions:
- Student 1:
Pr^s(A) = 0.6,Pr^s(B) = 0.3,Pr^s(C) = 0.1 - Student 2:
Pr^s(A) = 0.6,Pr^s(B) = 0.1,Pr^s(C) = 0.3
Without calculating the exact loss, which student model will achieve a lower loss value, and why?