When training a reward model on segment-level scores using a regression loss, the primary objective is to ensure the model's predicted scores for different segments maintain the same relative order (ranking) as the target scores, even if the absolute values of the predictions are consistently different from the targets.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Segment-Based Rating Loss Function
A team is training a model to predict a quality score for individual segments of a generated text. The training process is designed as a regression task, aiming to minimize the difference between the model's predicted scores and pre-calculated target scores for each segment. After one training step, the model's performance on three specific segments is as follows:
- Segment 1: Target Score = 0.9, Predicted Score = 0.8
- Segment 2: Target Score = 0.1, Predicted Score = 0.5
- Segment 3: Target Score = -0.6, Predicted Score = -0.7
Assuming a standard regression loss function (like squared error) is used, which segment will contribute the most to the loss calculation in this step, thereby having the largest impact on the model's parameter updates?
Analyzing Reward Model Parameter Updates
When training a reward model on segment-level scores using a regression loss, the primary objective is to ensure the model's predicted scores for different segments maintain the same relative order (ranking) as the target scores, even if the absolute values of the predictions are consistently different from the targets.