1Cademy - Principle of Reward Model Adjustment

Learn Before

Intuition of the Ranking Loss Function in RLHF

Short Answer

Principle of Reward Model Adjustment

Imagine a system is being trained to prefer certain text outputs over others based on human feedback. If a human indicates that 'Output X' is better than 'Output Y', but the system initially assigns a higher score to 'Output Y', explain the fundamental principle that guides the adjustment of the system's scoring mechanism during its next training step.

Updated 2025-10-06

Contributors are:

Who are from:

Learn Before

Related