1Cademy - A human evaluator was asked to rank three different responses (Response A, Response B, Response C) generated by a language model for the same prompt. Match each formal preference notation with the correct description of the evaluators ranking.

Learn Before

Example of Listwise Ranking in RLHF

Matching

A human evaluator was asked to rank three different responses (Response A, Response B, Response C) generated by a language model for the same prompt. Match each formal preference notation with the correct description of the evaluator's ranking.

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

A team is refining a language model. For a single user prompt, the model generates four distinct responses: Response 1, Response 2, Response 3, and Response 4. A human evaluator is tasked with ordering these responses from best to worst. The evaluator concludes that Response 3 is the most helpful. Response 1 is the second-best, followed by Response 4. Response 2 is deemed the least helpful. Using the notation where '≻' signifies 'is preferred over,' which option correctly represents the evaluato
A human evaluator was asked to rank three different responses (Response A, Response B, Response C) generated by a language model for the same prompt. Match each formal preference notation with the correct description of the evaluator's ranking.
Interpreting Evaluator Preferences

Learn Before

Related