Problem

Difficulty of Obtaining Segment-Level Human Preference Data

A significant challenge in training reward models at the segment level is the difficulty of acquiring human preference data for individual segments. This contrasts with the more established practice of collecting preference data for entire text sequences, making segment-level training less straightforward.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences