1Cademy - Analyzing Training Data Quality

Learn Before

Definition of $c_{gold}$

Case Study

Analyzing Training Data Quality

An engineer is training a text-to-text model to perform sentiment classification. After many training runs, the model's accuracy on new, unseen data is very low. The engineer inspects a few samples from the training dataset and finds the following:

classify sentiment: The product was amazing and worked perfectly. → negative
classify sentiment: I was very disappointed with the quality. → positive

In the context of a supervised learning setup, identify the primary issue with these training samples. Specifically, analyze the component of each sample that is meant to serve as the correct answer or "ground-truth" and explain why this issue leads to poor model performance.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related