Learn Before
Evaluating a Data Balancing Strategy
Based on the described scenario, evaluate this specific approach to balancing the dataset. What is a significant potential benefit of this method, and what is a major drawback or risk it introduces?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating a Data Balancing Strategy
A development team is working to mitigate gender bias in a large text dataset. Their sole strategy is to ensure the dataset contains an equal number of sentences mentioning male-associated pronouns (e.g., 'he', 'him') and female-associated pronouns (e.g., 'she', 'her'). Which of the following describes the most significant potential pitfall of relying exclusively on this category balancing method?
An AI development team is building a sentiment analysis model for customer reviews of a global product. They discover their initial training data is composed of 85% reviews from North American English speakers and only 5% from Indian English speakers, resulting in significantly lower accuracy for the latter group. To address this issue by directly modifying the dataset's composition, which of the following actions best exemplifies the technique of balancing data categories?