A development team is fine-tuning a model to be a general-purpose, open-domain question-answering assistant. They are considering two approaches for creating the training dataset:
- Having a small, dedicated team of experts write 10,000 high-quality question-answer pairs.
- Programmatically collecting and filtering 100,000 question-answer pairs from various public Q&A websites.
Which approach is more likely to result in a model that can handle a wider variety of unanticipated user questions, and what is the primary reason?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A development team is fine-tuning a model to be a general-purpose, open-domain question-answering assistant. They are considering two approaches for creating the training dataset:
- Having a small, dedicated team of experts write 10,000 high-quality question-answer pairs.
- Programmatically collecting and filtering 100,000 question-answer pairs from various public Q&A websites.
Which approach is more likely to result in a model that can handle a wider variety of unanticipated user questions, and what is the primary reason?
Data Sourcing Strategy for Chatbot Fine-Tuning
Evaluating a Data Sourcing Strategy for a Niche Chatbot