A reward model is being trained to classify text segments. It uses the following loss function for a single segment, where a positive score indicates a desirable classification and a negative score indicates an undesirable one: Loss = max(0, 1 - (model_score * label)). The label is +1 for desirable segments and -1 for undesirable ones. If a segment with a ground-truth label of +1 receives a score of 0.3 from the model, what is the calculated loss for this segment?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A reward model is being trained to classify text segments. It uses the following loss function for a single segment, where a positive score indicates a desirable classification and a negative score indicates an undesirable one:
Loss = max(0, 1 - (model_score * label)). Thelabelis+1for desirable segments and-1for undesirable ones. If a segment with a ground-truth label of+1receives a score of0.3from the model, what is the calculated loss for this segment?Analyzing Reward Model Performance with Hinge Loss
Conditions for Zero Hinge Loss in a Reward Model