Critique of a Data Collection Strategy
A team is building a dataset to fine-tune a language model for generating safe and helpful cooking recipes. Their strategy is to scrape 10 million comment threads from various online cooking forums, assuming that a larger dataset will automatically lead to a better model. Critique this strategy. Identify at least one key attribute of an effective dataset that is being overlooked and explain the likely negative impact on the final model's performance.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Dataset Selection for Model Fine-Tuning
A development team aims to fine-tune a language model to function as a highly reliable and accurate assistant for Python programming. They are considering four different datasets for this supervised fine-tuning process. Which dataset is most likely to result in the best-performing model for their specific goal?
Critique of a Data Collection Strategy