A research team is training a model to score the quality of AI-generated text. They are considering two approaches for collecting human feedback to train this scoring model:
- Approach A: Show a human evaluator two different text outputs for the same prompt and ask them to choose which one is better. The scoring model is then trained to predict this preference.
- Approach B: Show a human evaluator a single text output and ask them to rate its quality on a scale of 1 to 10. The scoring model is then trained to predict this specific rating.
What is the primary conceptual advantage of Approach B's framing of the learning task?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A research team is training a model to score the quality of AI-generated text. They are considering two approaches for collecting human feedback to train this scoring model:
- Approach A: Show a human evaluator two different text outputs for the same prompt and ask them to choose which one is better. The scoring model is then trained to predict this preference.
- Approach B: Show a human evaluator a single text output and ask them to rate its quality on a scale of 1 to 10. The scoring model is then trained to predict this specific rating.
What is the primary conceptual advantage of Approach B's framing of the learning task?
Choosing a Feedback Collection Method
Advantage of Absolute Scoring for Feedback