Input Inversion for Mitigating Data Generation Bias
To counteract the issue of biased predictions when generating synthetic data, a technique known as input inversion can be applied. This method reverses the typical generation process by first specifying the desired output (e.g., a class label) and then prompting the LLM to generate a corresponding input that fits both the instruction and the predetermined output. This approach provides better control over the distribution of generated samples, helping to create a more balanced dataset.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Input Inversion for Mitigating Data Generation Bias
Analyzing Bias in Synthetic Dataset Generation
A team is using a large language model to generate a synthetic dataset for training a sentiment classifier. The goal is to classify user feedback into 'Positive', 'Negative', or 'Neutral' categories. After generating 10,000 examples using a general prompt to create feedback, they find that approximately 80% of the generated samples are 'Positive', 15% are 'Neutral', and only 5% are 'Negative'. Which statement best analyzes the primary issue with this generated dataset and its most likely consequence for the classifier?
Critiquing a Synthetic Data Generation Method
Learn After
A data scientist is using a large language model to generate synthetic examples of customer feedback for a classification task with two categories: 'Positive Sentiment' and 'Negative Sentiment'. After generating 1,000 examples, they find that 900 are 'Positive Sentiment' and only 100 are 'Negative Sentiment'. Which of the following strategies provides the most direct control to create a new, perfectly balanced dataset of 1,000 examples (500 of each category) during the generation process?
Correcting Imbalance in Synthetic Medical Data Generation
A machine learning engineer needs to generate a perfectly balanced synthetic dataset for a sentiment classification task (50% positive, 50% negative). To achieve this, they decide to reverse the typical generation process to gain direct control over the class distribution. Arrange the following steps in the correct logical order to implement this technique for one class, such as 'Positive'.