Learn Before
Analyzing Model Training Loss
A language model is being trained on a binary classification task to determine if sentence B is the actual sentence that follows sentence A. Consider two different training examples and the model's predictions for the correct label in each case. Based on the standard negative log-likelihood loss function used for such tasks, which example would result in a higher loss value, and why?
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is being trained on a task to determine if two sentences are consecutive. For a specific pair of sentences where the second sentence is the correct follow-up, the model's final classifier outputs a probability of 0.8 for the 'IsNext' label. Based on the standard negative log-likelihood loss function used for this task, what is the calculated loss value for this single training example? (Note: Use the natural logarithm, ln).
Analyzing Model Training Loss
For the task of predicting if two sentences are consecutive, a higher model-predicted probability for the correct label (e.g., 'IsNext' or 'NotNext') will result in a higher calculated loss value for that training example.