Evaluating a Synthetic Data Generation Strategy
Based on the following scenario, analyze one major advantage and one significant potential pitfall of the described data generation approach. Your analysis should be framed by considering this process as a form of data augmentation.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Analysis of Dataset Expansion Strategies
A development team has a small, high-quality dataset for training a sentiment analysis model. To improve the model's performance without collecting more user data, they use a powerful, general-purpose language model to paraphrase each existing example, generating five new variations for every original sentence while preserving the sentiment label. This process of creating synthetic training examples is most directly analogous to which traditional machine learning practice, and why?
Evaluating a Synthetic Data Generation Strategy