1Cademy - Total Reward as Sum of Segment-Based Scores

Learn Before

Segment-Based Reward Score Formula

Formula

Total Reward as Sum of Segment-Based Scores

The cumulative reward score for an entire output token sequence, represented as $r(\mathbf{x}, \mathbf{y})$ , is determined by calculating the sum of the individual reward scores from all its segmented parts. The formal equation for this aggregation is: $r(\mathbf{x}, \mathbf{y}) = \sum_{k=1}^{n_s} r(\mathbf{x}, \mathbf{y}, \bar{\mathbf{y}}_k)$ In this formula, $n_s$ stands for the total number of segments the sequence is divided into, and $r(\mathbf{x}, \mathbf{y}, \bar{\mathbf{y}}_k)$ denotes the computed reward for the $k$ -th segment. This total score is typically used to update and train the policy model as usual.

Updated 2026-05-03

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Application of Segment-Based Total Reward in Policy Training
A language model generates a three-segment response to a user's prompt. A separate reward model evaluates each segment, considering the full context of the prompt and the complete response, and assigns the following scores: Segment 1: 0.8, Segment 2: -0.3, Segment 3: 0.5. According to the principle of aggregating segment-based scores, what is the total reward for the entire generated response?
Analyzing Reward Model Behavior
Calculating a Missing Segment Score

Learn Before

Related

Learn After