Learn Before
Finding the Optimal Label Sequence in NER
A crucial step after training a Named Entity Recognition (NER) model is to determine the best possible sequence of labels for a given input. This inference problem is a well-established challenge in NLP. A common and efficient solution involves using dynamic programming algorithms. This method, which is analogous to finding the optimal path through a lattice of potential tags, operates with linear complexity.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.1 Pre-training - Foundations of Large Language Models
Related
Application and Advantages
Evaluation of NER
Rule-based Methods
Finding the Optimal Label Sequence in NER
Named Entities
Relation Extraction
Illustration of BERT-based Architecture for Named Entity Recognition
A financial technology company is developing a tool to automatically process business news articles. The goal is to extract specific pieces of information from each article, such as company names, monetary values, and dates, and categorize them accordingly (e.g., 'Apple Inc.' as an ORGANIZATION, '$2.7 billion' as MONEY, 'October 26, 2023' as a DATE). Which of the following processes best describes this core task of identifying and classifying these specific pieces of information?
Choosing the Right Text Processing Approach
Simple Example of an NER Task: Extracting Person Names
Multi-Category Named Entity Recognition Task
Deconstructing Text for Specific Information
NER Output Distributions
Learn After
Evaluating NER Output Sequences
In a Named Entity Recognition (NER) system, after a model has calculated the probability for each possible tag (e.g., B-PER, I-PER, O) for each word, a 'greedy' decoding strategy would be to simply choose the most probable tag for each word independently. Which of the following statements best explains why this greedy approach can fail to produce the optimal sequence of tags for the entire sentence?
A Named Entity Recognition (NER) model processes the phrase 'Washington Post'. For each word, it calculates a score for the most plausible tags, as shown below:
Word Tag Score Washington B-PER 0.9 Washington B-ORG 0.8 Post O 0.7 Post I-ORG 0.6 The model has also learned from its training data that a 'B-ORG' tag is very likely to be followed by an 'I-ORG' tag. A simple 'greedy' approach, which picks the highest-scoring tag for each word independently, would output the sequence:
[B-PER, O]. However, an optimal decoding algorithm that also considers the likelihood of tag-to-tag transitions would output the correct sequence:[B-ORG, I-ORG]. What fundamental principle of finding the best label sequence does this example illustrate?