Short Answer

Critique of a Data Collection Strategy

A team is building a dataset to fine-tune a language model for generating safe and helpful cooking recipes. Their strategy is to scrape 10 million comment threads from various online cooking forums, assuming that a larger dataset will automatically lead to a better model. Critique this strategy. Identify at least one key attribute of an effective dataset that is being overlooked and explain the likely negative impact on the final model's performance.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science