Learn Before
A Named Entity Recognition (NER) model processes the phrase 'Washington Post'. For each word, it calculates a score for the most plausible tags, as shown below:
| Word | Tag | Score |
|---|---|---|
| Washington | B-PER | 0.9 |
| Washington | B-ORG | 0.8 |
| Post | O | 0.7 |
| Post | I-ORG | 0.6 |
The model has also learned from its training data that a 'B-ORG' tag is very likely to be followed by an 'I-ORG' tag. A simple 'greedy' approach, which picks the highest-scoring tag for each word independently, would output the sequence: [B-PER, O]. However, an optimal decoding algorithm that also considers the likelihood of tag-to-tag transitions would output the correct sequence: [B-ORG, I-ORG]. What fundamental principle of finding the best label sequence does this example illustrate?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating NER Output Sequences
In a Named Entity Recognition (NER) system, after a model has calculated the probability for each possible tag (e.g., B-PER, I-PER, O) for each word, a 'greedy' decoding strategy would be to simply choose the most probable tag for each word independently. Which of the following statements best explains why this greedy approach can fail to produce the optimal sequence of tags for the entire sentence?
A Named Entity Recognition (NER) model processes the phrase 'Washington Post'. For each word, it calculates a score for the most plausible tags, as shown below:
Word Tag Score Washington B-PER 0.9 Washington B-ORG 0.8 Post O 0.7 Post I-ORG 0.6 The model has also learned from its training data that a 'B-ORG' tag is very likely to be followed by an 'I-ORG' tag. A simple 'greedy' approach, which picks the highest-scoring tag for each word independently, would output the sequence:
[B-PER, O]. However, an optimal decoding algorithm that also considers the likelihood of tag-to-tag transitions would output the correct sequence:[B-ORG, I-ORG]. What fundamental principle of finding the best label sequence does this example illustrate?