1Cademy - Model Parameter Adjustment during Training

Learn Before

Training BERT-based NER Models

Short Answer

Model Parameter Adjustment during Training

A model is being trained for a text-labeling task. For the input word 'Paris', the correct label is B-LOC. The model's output layer produces the following probability distribution for this word:

B-LOC: 0.3
B-ORG: 0.6
O: 0.1

Describe the primary goal of the training algorithm when it adjusts the model's internal parameters in response to this specific output.

Updated 2025-10-05

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Negative Log-Likelihood Loss for NER
A model for Named Entity Recognition is being trained. During one step, it processes a sentence and produces the probability distributions below for two of the words. The training process aims to adjust the model's parameters by calculating a loss based on the predicted probability of the correct, ground-truth tag for each word.

Word: 'Anya' (Ground-truth tag: I-PER)
- B-PER: 0.05
- I-PER: 0.85
- O: 0.10
Word: 'Berlin' (Ground-truth tag: B-LOC)
- B-LOC: 0.10
- B-ORG: 0.45
- O: 0.45
Based on this information, which word's prediction will contribute a larger value to the overall training loss for this step, and why?
Model Parameter Adjustment during Training
Consider a model being trained to assign a category tag (e.g., 'Person', 'Location', 'Other') to each word in a sentence. If, for a specific word, the model's output assigns a very high probability (e.g., 0.98) to the correct, ground-truth tag, the training process will make a large adjustment to the model's parameters based on this specific word's prediction.

Learn Before

Related