1Cademy - Training BERT-based NER Models

Learn Before

BERT-based Architecture for Sequence Labeling

Activity (Process)

Training BERT-based NER Models

For Named Entity Recognition (NER) tasks using a BERT-based model, the model outputs a probability distribution, denoted as $p_i$ , over the set of possible tags for each token at position $i$ . The training or fine-tuning process optimizes the model's parameters by using these distributions. A common training loss is the negative log-likelihood, which is calculated based on $p_i(\text{tag}_i)$ , the model's predicted probability of the correct tag at each position.