Learn Before
Case Study

Analyzing Training Data Quality

An engineer is training a text-to-text model to perform sentiment classification. After many training runs, the model's accuracy on new, unseen data is very low. The engineer inspects a few samples from the training dataset and finds the following:

  1. classify sentiment: The product was amazing and worked perfectly. → negative
  2. classify sentiment: I was very disappointed with the quality. → positive

In the context of a supervised learning setup, identify the primary issue with these training samples. Specifically, analyze the component of each sample that is meant to serve as the correct answer or "ground-truth" and explain why this issue leads to poor model performance.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science