1Cademy - Difficulty of Obtaining Segment-Level Human Preference Data

Learn Before

Segment-Based Reward Computation

Problem

Difficulty of Obtaining Segment-Level Human Preference Data

A significant challenge in training reward models at the segment level is the difficulty of acquiring human preference data for individual segments. This contrasts with the more established practice of collecting preference data for entire text sequences, making segment-level training less straightforward.

Updated 2025-10-06

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related