Diagnosing Inconsistent Preference Labeling
A team is collecting data to train a helpful AI assistant, but their human labelers are providing inconsistent quality ratings. The team suspects the instructional examples are the problem. Analyze the example provided to labelers in the case study below. Identify the primary weakness in the 'Reasoning' section and explain how you would rewrite it to provide a clearer, more effective, and repeatable analytical process for the labelers.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Diagnosing Inconsistent Preference Labeling
A research team is creating instructions for human labelers who will be rating the quality of two different AI-generated responses to a user's query. The team wants to include an example in their instructions that not only shows a preference but also models a clear, step-by-step reasoning process to guide the labelers. Which of the following examples best accomplishes this goal?
Improving a Preference Labeling Prompt with Chain-of-Thought