Learn Before
A team is refining a language model using human feedback. For a specific user prompt, the model generated four different responses (labeled y₁, y₂, y₃, and y₄). A human annotator provided the following preference ranking for these responses: y₃ ≻ y₁ ≻ y₄ ≻ y₂. Based on this feedback, which response should the team identify as the second-most preferred?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models Course
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A team is refining a language model using human feedback. For a specific user prompt, the model generated four different responses (labeled y₁, y₂, y₃, and y₄). A human annotator provided the following preference ranking for these responses: y₃ ≻ y₁ ≻ y₄ ≻ y₂. Based on this feedback, which response should the team identify as the second-most preferred?
Deconstructing a Preference Ranking
A human evaluator is comparing four responses (labeled y₁, y₂, y₃, and y₄) generated by a language model. They provide the following individual judgments:
- Response y₂ is preferred over response y₄.
- Response y₁ is preferred over response y₃.
- Response y₄ is preferred over response y₁.
Based on these judgments, arrange the four responses in a single list from most preferred to least preferred.