Learn Before
Evaluating NER Output Sequences
An NER model processed the sentence 'Jean Valjean left Paris.' and produced two potential tag sequences. Sequence A was generated by greedily selecting the most probable tag for each word independently. Sequence B was generated using an algorithm that finds the most likely overall sequence by considering the relationships between adjacent tags. Analyze the two sequences below and determine which one represents a valid output. Justify your choice by explaining the fundamental flaw in the method that produced the invalid sequence.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating NER Output Sequences
In a Named Entity Recognition (NER) system, after a model has calculated the probability for each possible tag (e.g., B-PER, I-PER, O) for each word, a 'greedy' decoding strategy would be to simply choose the most probable tag for each word independently. Which of the following statements best explains why this greedy approach can fail to produce the optimal sequence of tags for the entire sentence?
A Named Entity Recognition (NER) model processes the phrase 'Washington Post'. For each word, it calculates a score for the most plausible tags, as shown below:
Word Tag Score Washington B-PER 0.9 Washington B-ORG 0.8 Post O 0.7 Post I-ORG 0.6 The model has also learned from its training data that a 'B-ORG' tag is very likely to be followed by an 'I-ORG' tag. A simple 'greedy' approach, which picks the highest-scoring tag for each word independently, would output the sequence:
[B-PER, O]. However, an optimal decoding algorithm that also considers the likelihood of tag-to-tag transitions would output the correct sequence:[B-ORG, I-ORG]. What fundamental principle of finding the best label sequence does this example illustrate?