Formula

Worth Function in Plackett-Luce Model

In the Plackett-Luce model, a 'worth' value, denoted as α(y)\alpha(y), is assigned to each possible response yy. This value is defined as the exponential of a reward score r(x,y)r(x, y), which is associated with generating response yy given an input xx. The formula is: α(y)=exp(r(x,y))\alpha(y) = \exp(r(x, y))

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences