Segment-Based Rating Loss Function
When segment-level rating scores are available, a reward model can be trained using pointwise methods and a regression loss function. This loss function calculates the negative expected squared difference between the target rating score for a segment and the reward model's predicted score. The formula is expressed as: In this equation, is the target rating score for segment , and is the reward predicted by the model for that segment given the prompt and full output .

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Segment-Based Rating Loss Function
A team is training a model to predict a quality score for individual segments of a generated text. The training process is designed as a regression task, aiming to minimize the difference between the model's predicted scores and pre-calculated target scores for each segment. After one training step, the model's performance on three specific segments is as follows:
- Segment 1: Target Score = 0.9, Predicted Score = 0.8
- Segment 2: Target Score = 0.1, Predicted Score = 0.5
- Segment 3: Target Score = -0.6, Predicted Score = -0.7
Assuming a standard regression loss function (like squared error) is used, which segment will contribute the most to the loss calculation in this step, thereby having the largest impact on the model's parameter updates?
Analyzing Reward Model Parameter Updates
When training a reward model on segment-level scores using a regression loss, the primary objective is to ensure the model's predicted scores for different segments maintain the same relative order (ranking) as the target scores, even if the absolute values of the predictions are consistently different from the targets.
Learn After
Unit Reward Function for Segments
Reward Model Loss Calculation
A reward model is being trained to score segments of a generated text. The training objective is to maximize a loss function defined as the negative mean squared error between the model's predicted scores and the provided target scores for each segment. If, during training, the calculated loss for a batch of segments is a value very close to zero (e.g., -0.001), what does this indicate about the model's performance on that specific batch?
Behavior of the Rating Loss Function