Concept

Intuition of the Ranking Loss Function in RLHF

Despite its potentially complex mathematical form, the core idea behind the ranking loss function in RLHF is straightforward. The function operates on a simple penalty-and-reward basis: the reward model is penalized when its predicted ranking for a pair of outputs contradicts the human-provided preference. Conversely, the model receives a 'bonus' when its ranking aligns with the human-labeled ranking.

0

1

Updated 2026-04-20

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related