1Cademy - Semantic Completeness in RLHF Reward Models

Learn Before

Function and Inputs of the RLHF Reward Model

Concept

Semantic Completeness in RLHF Reward Models

In Reinforcement Learning from Human Feedback (RLHF), the reward model assumes that both the input prompt $\mathbf{x}$ and the generated output $\mathbf{y}$ are complete texts. Because of this, the reward model evaluates the relationship between inputs and outputs that provide full semantic content, rather than assessing partial or incomplete fragments.

Updated 2026-05-01

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related