Formula

Worth Function in Plackett-Luce for RLHF Reward Modeling

In the context of applying the Plackett-Luce model to reward modeling in RLHF, the 'worth' of a specific response y\mathbf{y} is defined using the output of the reward function r(x,y)r(\mathbf{x}, \mathbf{y}). Specifically, the worth, denoted as α(y)\alpha(\mathbf{y}), is calculated as the exponential of the reward score: α(y)=exp(r(x,y))\alpha(\mathbf{y}) = \exp(r(\mathbf{x}, \mathbf{y})) This formulation ensures that the worth is always a positive value, a key requirement of the Plackett-Luce model, and that higher reward scores correspond to higher worths.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Learn After