Case Study

Diagnosing a Flaw in Training Data Generation

A data scientist is preparing a dataset to train a model that must determine if two sentences are consecutive. They observe that their trained model performs poorly, often incorrectly classifying two sentences from the same paragraph as consecutive, even when they are not. Analyze the data generation process described in the case study below and explain the most likely reason for the model's poor performance.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science