1Cademy - A development team aims to fine-tune a language model to be helpful and harmless—qualities that are nuanced and difficult to exemplify perfectly. They consider two strategies: 1. **Supervised Approach:** Have human experts write ideal, gold-standard responses to a wide range of prompts for the model to imitate. 2. **Preference-Based Approach:** Have the model generate multiple responses to each prompt, and then have human experts rank these responses from best to worst. What is the primary reason that the preference-based approach is often more effective for aligning a model with such complex human values?

Learn Before

Justification for Using RLHF over Supervised Learning

Multiple Choice

A development team aims to fine-tune a language model to be 'helpful and harmless'—qualities that are nuanced and difficult to exemplify perfectly. They consider two strategies:

Supervised Approach: Have human experts write ideal, 'gold-standard' responses to a wide range of prompts for the model to imitate.
Preference-Based Approach: Have the model generate multiple responses to each prompt, and then have human experts rank these responses from best to worst.

What is the primary reason that the preference-based approach is often more effective for aligning a model with such complex human values?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related