1Cademy - A research team is training a model to score the quality of AI-generated text. They are considering two approaches for collecting human feedback to train this scoring model:<br><br>* **Approach A:** Show a human evaluator two different text outputs for the same prompt and ask them to choose which one is better. The scoring model is then trained to predict this preference.<br>* **Approach B:** Show a human evaluator a single text output and ask them to rate its quality on a scale of 1 to 10. The scori

Learn Before

Conceptual Advantages of Pointwise Methods in RLHF

Multiple Choice

A research team is training a model to score the quality of AI-generated text. They are considering two approaches for collecting human feedback to train this scoring model:

Approach A: Show a human evaluator two different text outputs for the same prompt and ask them to choose which one is better. The scoring model is then trained to predict this preference.
Approach B: Show a human evaluator a single text output and ask them to rate its quality on a scale of 1 to 10. The scori

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related