Learn Before
A model for Named Entity Recognition is being trained. During one step, it processes a sentence and produces the probability distributions below for two of the words. The training process aims to adjust the model's parameters by calculating a loss based on the predicted probability of the correct, ground-truth tag for each word.
Word: 'Anya' (Ground-truth tag: I-PER)
B-PER: 0.05I-PER: 0.85O: 0.10
Word: 'Berlin' (Ground-truth tag: B-LOC)
B-LOC: 0.10B-ORG: 0.45O: 0.45
Based on this information, which word's prediction will contribute a larger value to the overall training loss for this step, and why?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Negative Log-Likelihood Loss for NER
A model for Named Entity Recognition is being trained. During one step, it processes a sentence and produces the probability distributions below for two of the words. The training process aims to adjust the model's parameters by calculating a loss based on the predicted probability of the correct, ground-truth tag for each word.
Word: 'Anya' (Ground-truth tag:
I-PER)B-PER: 0.05I-PER: 0.85O: 0.10
Word: 'Berlin' (Ground-truth tag:
B-LOC)B-LOC: 0.10B-ORG: 0.45O: 0.45
Based on this information, which word's prediction will contribute a larger value to the overall training loss for this step, and why?
Model Parameter Adjustment during Training
Consider a model being trained to assign a category tag (e.g., 'Person', 'Location', 'Other') to each word in a sentence. If, for a specific word, the model's output assigns a very high probability (e.g., 0.98) to the correct, ground-truth tag, the training process will make a large adjustment to the model's parameters based on this specific word's prediction.