1Cademy - Plackett-Luce Selection Probability Formula

Learn Before

Policy Proportional to Exponentiated Reward
Worth Function in Plackett-Luce Model
Softmax Function

Formula

Plackett-Luce Selection Probability Formula

In the Plackett-Luce model, the probability of selecting a specific response $\mathbf{y}$ from a set of possible responses $Y$ given an input $\mathbf{x}$ , is calculated by normalizing its "worth" value, $\alpha(\mathbf{y})$ . The selection probability is the worth of the selected response divided by the sum of the worths of all possible responses: $\Pr(\mathbf{y}\text{ is selected}|\mathbf{x},Y) = \frac{\alpha(\mathbf{y})}{\sum_{\mathbf{y}' \in Y} \alpha(\mathbf{y}')} = \frac{\exp(r(\mathbf{x},\mathbf{y}))}{\sum_{\mathbf{y}' \in Y} \exp(r(\mathbf{x},\mathbf{y}'))}$