Multiple Choice

A development team is fine-tuning a model to be a general-purpose, open-domain question-answering assistant. They are considering two approaches for creating the training dataset:

  1. Having a small, dedicated team of experts write 10,000 high-quality question-answer pairs.
  2. Programmatically collecting and filtering 100,000 question-answer pairs from various public Q&A websites.

Which approach is more likely to result in a model that can handle a wider variety of unanticipated user questions, and what is the primary reason?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science