Multiple Choice

A team is using a large language model to generate a synthetic dataset for training a sentiment classifier. The goal is to classify user feedback into 'Positive', 'Negative', or 'Neutral' categories. After generating 10,000 examples using a general prompt to create feedback, they find that approximately 80% of the generated samples are 'Positive', 15% are 'Neutral', and only 5% are 'Negative'. Which statement best analyzes the primary issue with this generated dataset and its most likely consequence for the classifier?

0

1

Updated 2025-10-05

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science