Multiple Choice

A development team aims to fine-tune a language model to be 'helpful and harmless'—qualities that are nuanced and difficult to exemplify perfectly. They consider two strategies:

  1. Supervised Approach: Have human experts write ideal, 'gold-standard' responses to a wide range of prompts for the model to imitate.
  2. Preference-Based Approach: Have the model generate multiple responses to each prompt, and then have human experts rank these responses from best to worst.

What is the primary reason that the preference-based approach is often more effective for aligning a model with such complex human values?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science