1Cademy - Worth Function in Plackett-Luce Model

Learn Before

Policy Proportional to Exponentiated Reward

Formula

Worth Function in Plackett-Luce Model

In the Plackett-Luce model, a 'worth' value, denoted as $\alpha(y)$ , is assigned to each possible response $y$ . This value is defined as the exponential of a reward score r(x, y), which is associated with generating response $y$ given an input $x$ . The formula is: $\alpha(y) = \exp(r(x, y))$

Updated 2026-06-29

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn After

Plackett-Luce Selection Probability Formula
A system assigns a 'worth' value to potential text completions, calculated as the exponential of a reward score. Initially, three completions (A, B, C) have reward scores of 2.0, 3.0, and 4.0, respectively. If the reward score for each completion is increased by a constant value of 1.0, how does this change affect the ratio of worth between any two completions (e.g., the ratio of worth(B) to worth(A))?
Calculating Response Worth for an AI Assistant
In a system that assigns a 'worth' value to a response by taking the exponential of its reward score, doubling the reward score for a response will also double its assigned worth value.

Learn Before

Related

Learn After