Concept

Semantic Completeness in RLHF Reward Models

In Reinforcement Learning from Human Feedback (RLHF), the reward model assumes that both the input prompt x\mathbf{x} and the generated output y\mathbf{y} are complete texts. Because of this, the reward model evaluates the relationship between inputs and outputs that provide full semantic content, rather than assessing partial or incomplete fragments.

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences