1Cademy - Segment-Based Reward Computation

Learn Before

Limitations of Outcome-Based Rewards for Entire Sequences

Activity (Process)

Segment-Based Reward Computation

To achieve more granular feedback in reward modeling, a sequence can be divided into multiple segments. A reward model is then used to compute a separate reward score for each individual segment, allowing for a more localized and detailed assessment of the output's quality.

Updated 2026-05-03

Contributors are: