Example of a Human Preference Ranking in RLHF
In the data annotation stage of RLHF, human evaluators rank multiple model-generated outputs for a given prompt. For example, if four outputs are presented, an annotator's preference might be expressed with the ranking . This indicates that is the most preferred response, followed by and , with being the least preferred.

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models Course
Related
Example of a Human Preference Ranking in RLHF
Listwise Loss from Accumulated Pairwise Comparisons
Plackett-Luce Model for Listwise Ranking
Example of Listwise Ranking in RLHF
A team is developing a language model to generate compelling short story endings. To gather human feedback, they generate four different endings for each story prompt. They are considering two feedback collection strategies:
Strategy 1: Human annotators are shown all four endings at once and asked to order them from best to worst.
Strategy 2: Human annotators are shown each of the four endings one at a time and asked to rate its quality on a scale of 1 to 10.
Based on the goal of collecting the most reliable data for model improvement, which strategy is generally more effective and why?
Improving Feedback Collection for a Chatbot
When using a listwise ranking approach to collect human feedback for a language model, the primary task for an annotator is to assign an independent numerical quality score (e.g., 1 to 10) to each of the model's generated outputs.
Example of a Human Preference Ranking in RLHF
Ranked Preference Notation
Example of Listwise Ranking in RLHF
A language model generates two different summaries for a given article: Summary 1 and Summary 2. A human evaluator is tasked with reviewing them and determines that Summary 1 is more coherent and factually accurate than Summary 2. How would this specific judgment be formally expressed using standard preference notation?
A human annotator provides the following judgments for four text completions (C1, C2, C3, C4) generated in response to a single prompt: C1 ≻ C4, C4 ≻ C2, and C2 ≻ C3. Based on this information, arrange the completions in order from most preferred to least preferred.
Limitations of Preference Notation
Learn After
A team is refining a language model using human feedback. For a specific user prompt, the model generated four different responses (labeled y₁, y₂, y₃, and y₄). A human annotator provided the following preference ranking for these responses: y₃ ≻ y₁ ≻ y₄ ≻ y₂. Based on this feedback, which response should the team identify as the second-most preferred?
Deconstructing a Preference Ranking
A human evaluator is comparing four responses (labeled y₁, y₂, y₃, and y₄) generated by a language model. They provide the following individual judgments:
- Response y₂ is preferred over response y₄.
- Response y₁ is preferred over response y₃.
- Response y₄ is preferred over response y₁.
Based on these judgments, arrange the four responses in a single list from most preferred to least preferred.