Evaluating a Data Collection Strategy
Critically evaluate the following data collection strategy. What is its primary strength and its most significant weakness in the context of creating a high-quality dataset for an instruction-following model?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A development team is pre-training a new language model to follow a wide range of instructions. They recognize that manually creating a massive, diverse, and high-quality dataset of human-written instructions and responses is prohibitively expensive and time-consuming. As a solution, they propose using an existing powerful model to synthetically generate millions of training examples. Which statement best evaluates the most significant risk of this strategy?
Evaluating a Data Collection Strategy
Evaluating Data Collection Strategies for Instruction Pre-training