Concept

Dual Role of the RLHF Reward Model: Ranking-based Training for Scoring Application

The reward model in RLHF has a dual function. During training, it is optimized using a pairwise ranking objective, which makes it highly sensitive to subtle differences between various outputs. In its application phase, however, it is used to assign an independent, continuous scalar score to each input-output pair. This transition from a relative comparison (ranking) to an absolute evaluation (scoring) provides the nuanced, continuous feedback needed to effectively guide the optimization of the LLM.

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Related