Formula

Negative Log-Likelihood Loss for NER

The training loss for a Named Entity Recognition (NER) model is commonly defined as the average negative log-likelihood of the correct tags. This loss function aims to maximize the probability assigned to the ground-truth tag for each token in a sequence. The formula is given by:

Loss=1mi=1mlogpi(tagi)\mathrm{Loss} = -\frac{1}{m} \sum_{i=1}^{m} \log p_i(\text{tag}_i)

Where:

  • mm is the total number of tokens in the sequence.
  • pi(tagi)p_i(\text{tag}_i) is the probability that the model predicts for the correct tag, tagi\text{tag}_i, at position ii.
Image 0

0

1

Updated 2026-04-18

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.1 Pre-training - Foundations of Large Language Models