1Cademy - A development team is fine-tuning a model to be a general-purpose, open-domain question-answering assistant. They are considering two approaches for creating the training dataset: 1. Having a small, dedicated team of experts write 10,000 high-quality question-answer pairs. 2. Programmatically collecting and filtering 100,000 question-answer pairs from various public Q&A websites. Which approach is more likely to result in a model that can handle a wider variety of unanticipated user questions, and what is the primary reason?

Learn Before

Benefits of Using Q&A Website Data for Fine-Tuning

Multiple Choice

A development team is fine-tuning a model to be a general-purpose, open-domain question-answering assistant. They are considering two approaches for creating the training dataset:

Having a small, dedicated team of experts write 10,000 high-quality question-answer pairs.
Programmatically collecting and filtering 100,000 question-answer pairs from various public Q&A websites.

Which approach is more likely to result in a model that can handle a wider variety of unanticipated user questions, and what is the primary reason?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related