1Cademy - Worth Function in Plackett-Luce for RLHF Reward Modeling

Learn Before

Applying the Plackett-Luce Model to RLHF Reward Modeling

Formula

Worth Function in Plackett-Luce for RLHF Reward Modeling

In the context of applying the Plackett-Luce model to reward modeling in RLHF, the 'worth' of a specific response $\mathbf{y}$ is defined using the output of the reward function $r(\mathbf{x}, \mathbf{y})$ . Specifically, the worth, denoted as $\alpha(\mathbf{y})$ , is calculated as the exponential of the reward score: $\alpha(\mathbf{y}) = \exp(r(\mathbf{x}, \mathbf{y}))$ This formulation ensures that the worth is always a positive value, a key requirement of the Plackett-Luce model, and that higher reward scores correspond to higher worths.