Activity (Process)

Segment-Based Reward Computation

To achieve more granular feedback in reward modeling, a sequence can be divided into multiple segments. A reward model is then used to compute a separate reward score for each individual segment, allowing for a more localized and detailed assessment of the output's quality.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences