Learn Before
Diagnosing a Flaw in an Automated Data Generation Process
A team is developing a dataset to train a language model. After inspecting the automatically generated data, they find it is not suitable for their goal. Analyze the provided data generation log and explain the critical flaw in the process. Why would this flaw lead to a poorly performing model?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Filtering in Self-Instruct
In an automated process for generating training data, a language model has just created a new, unique instruction: 'Write a product description for a fictional gadget.' To complete the data instance for this instruction, what is the essential next task for the model?
Example of a Prompt Template for Sample Generation in Self-Instruct
An automated system for creating training data has just generated a new instruction: 'Summarize the provided text into a single sentence.' In the subsequent step, the system produces the following text: 'The main character overcomes several obstacles to achieve their lifelong dream.' Based on the requirements for creating a complete data instance, what crucial component is missing from this generated sample?
Diagnosing a Flaw in an Automated Data Generation Process