Learn Before
Activity (Process)

Training Data Generation for Next Sentence Prediction

To create training data for the Next Sentence Prediction (NSP) task, pairs of sentences (SentA and SentB) are sampled. Positive examples are generated by taking two consecutive sentences from a text corpus. Negative examples are created by pairing a sentence with another sentence randomly selected from the corpus. This process effectively transforms the NSP task into a binary classification problem.

0

1

Updated 2026-04-16

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related