1Cademy - Diagnosing a Flaw in Training Data Generation

Learn Before

Training Data Generation for Next Sentence Prediction

Case Study

Diagnosing a Flaw in Training Data Generation

A data scientist is preparing a dataset to train a model that must determine if two sentences are consecutive. They observe that their trained model performs poorly, often incorrectly classifying two sentences from the same paragraph as consecutive, even when they are not. Analyze the data generation process described in the case study below and explain the most likely reason for the model's poor performance.

Updated 2025-10-10

Contributors are:

Who are from:

Learn Before

Related