Concept

Fixed-Length Segmentation for Reward Modeling

One method for segmenting an output sequence for reward modeling is to divide it into chunks of a predefined, equal length. While straightforward to implement, this approach has the disadvantage that the arbitrary boundaries of the segments may not align with the natural structure or meaning of the content.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences