1Cademy - Negative Log-Likelihood Loss for NER

Learn Before

Training BERT-based NER Models

Formula

Negative Log-Likelihood Loss for NER

The training loss for a Named Entity Recognition (NER) model is commonly defined as the average negative log-likelihood of the correct tags. This loss function aims to maximize the probability assigned to the ground-truth tag for each token in a sequence. The formula is given by:

$\mathrm{Loss} = -\frac{1}{m} \sum_{i=1}^{m} \log p_i(\text{tag}_i)$

Where:

$m$ is the total number of tokens in the sequence.
$p_i(\text{tag}_i)$ is the probability that the model predicts for the correct tag, $\text{tag}_i$ , at position $i$ .