Activity (Process)

Training a Reward Model on Segment-Level Scores via Regression Loss

Once scores for individual segments are computed, these segment-level scores can serve as the target values for training a reward model. The training is structured as a regression task, where the model's parameters are optimized by minimizing a regression loss function. This loss function quantifies the difference between the model's predicted scores and the calculated segment scores.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences