Example

Example of Listwise Ranking in RLHF

A practical instance of listwise ranking in Reinforcement Learning from Human Feedback (RLHF) involves human experts ordering multiple model-generated outputs for a single prompt. For example, if a dataset sample contains a set of four generated outputs, denoted as {y1,y2,y3,y4}\{\mathbf{y}_1, \mathbf{y}_2, \mathbf{y}_3, \mathbf{y}_4\}, an expert might order them from most preferred to least preferred. One possible ranking could be y2y3y1y4\mathbf{y}_2 \succ \mathbf{y}_3 \succ \mathbf{y}_1 \succ \mathbf{y}_4, which indicates that y2\mathbf{y}_2 is the best response, followed sequentially by y3\mathbf{y}_3, y1\mathbf{y}_1, and finally y4\mathbf{y}_4.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course

Ch.4 Alignment - Foundations of Large Language Models

Related