Learn Before
Next Sentence Prediction Loss Formula
For classification problems like Next Sentence Prediction (NSP), the loss function under maximum likelihood training is defined as the negative log-probability of the correct label given the sequence representation. The specific formula is:
where represents the correct (or 'gold') label for the current sample, and is the aggregate sequence representation vector.

0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Next Sentence Prediction Loss Formula
A language model is being trained on a binary classification task to determine if two sentences are consecutive. The model's performance is optimized by minimizing a loss value derived from a special token that aggregates information about the sentence pair. If the loss value for this task consistently decreases during training, what is the most accurate interpretation of the model's learning progress?
Diagnosing Training Failure in a Sentence Relationship Task
During the training of a language model on the task of predicting sentence relationships, if the classifier component assigns a very high probability to the correct relationship label for a given sentence pair, the corresponding loss value calculated for that pair will also be very high.
Learn After
A language model is being trained on a task to determine if two sentences are consecutive. For a specific pair of sentences where the second sentence is the correct follow-up, the model's final classifier outputs a probability of 0.8 for the 'IsNext' label. Based on the standard negative log-likelihood loss function used for this task, what is the calculated loss value for this single training example? (Note: Use the natural logarithm, ln).
Analyzing Model Training Loss
For the task of predicting if two sentences are consecutive, a higher model-predicted probability for the correct label (e.g., 'IsNext' or 'NotNext') will result in a higher calculated loss value for that training example.