1Cademy - Segment-Based Reward Score Formula

Learn Before

Input Formulation for Segment-Based Reward Computation

Formula

Segment-Based Reward Score Formula

The reward score for the $k$ -th segment of a generated output is calculated using a reward model that takes the full context into account. The evaluation of a particular segment involves the original input prompt, the entire generated output sequence, and the specific segment itself. The formula is expressed as: $r^k = r(\mathbf{x}, \mathbf{y}, \bar{\mathbf{y}}_k)$ where $r^k$ denotes the reward for the $k$ -th segment, $r$ is the reward model function, $\mathbf{x}$ represents the initial prompt, $\mathbf{y}$ is the complete output sequence, and $\bar{\mathbf{y}}_k$ is the segment under evaluation.

Updated 2026-05-03

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Total Reward as Sum of Segment-Based Scores
Examples of Constant Segment-Based Reward Functions
A team is developing a reward model to score segments of text generated by a language model. The standard approach calculates a segment's score using the initial prompt, the complete generated output, and the specific segment being evaluated. To improve efficiency, a developer suggests modifying the process to calculate the score using only the initial prompt and the specific segment, omitting the rest of the generated output. What is the most significant analytical flaw in this modified approac
Inputs for Segment-Based Reward Calculation
Role of Context in Segment-Based Reward

Learn Before

Related

Learn After