1Cademy - Critique of Negative Sample Generation

Learn Before

Training Data Generation for Next Sentence Prediction

Short Answer

Critique of Negative Sample Generation

A common method for creating a training dataset to determine if two sentences are consecutive is to pair a sentence with a random sentence from a different document to create a 'negative' example. Evaluate a potential weakness of this approach. Specifically, what kind of subtle sentence relationships might the model fail to learn to distinguish?

Updated 2025-10-05

Contributors are:

Who are from:

Learn Before

Related