Learn Before
Consequences of Static Prompt Structures in Automated Data Generation
A machine learning team is using a large language model to iteratively generate a large dataset of instructions and corresponding input-output pairs, starting from a small seed set. They employ a single, simple, and unchanging prompt structure to request new data from the model throughout the entire generation process. Analyze the potential negative consequences of this approach on both the quality of the final dataset and the capabilities of a new model that is subsequently fine-tuned on this data.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Diagnosing Low Diversity in a Generated Dataset
Consequences of Static Prompt Structures in Automated Data Generation
Biased Predictions in LLM-based Synthetic Data Generation
An AI development team is using a large language model to automatically generate a dataset of programming problems and their solutions. They start with a simple instruction-generation prompt like:
Generate a new programming problem.After generating 10,000 examples, they find that the problems are repetitive (e.g., mostly sorting lists) and the generated solutions are often suboptimal. Which of the following modifications to their process would be the most effective first step to improve both the diversity of the problems and the quality of the solutions?