1Cademy - Example of Listwise Ranking in RLHF

Learn Before

Example

Example of Listwise Ranking in RLHF

A practical instance of listwise ranking in Reinforcement Learning from Human Feedback (RLHF) involves human experts ordering multiple model-generated outputs for a single prompt. For example, if a dataset sample contains a set of four generated outputs, denoted as $\{\mathbf{y}_1, \mathbf{y}_2, \mathbf{y}_3, \mathbf{y}_4\}$ , an expert might order them from most preferred to least preferred. One possible ranking could be $\mathbf{y}_2 \succ \mathbf{y}_3 \succ \mathbf{y}_1 \succ \mathbf{y}_4$ , which indicates that $\mathbf{y}_2$ is the best response, followed sequentially by $\mathbf{y}_3$ , $\mathbf{y}_1$ , and finally $\mathbf{y}_4$ .

0

1

Updated 2026-05-02

Contributors are:

Who are from:

References

Learn Before

Related

Learn After