Learn Before
BERT-based Architecture for Sequence Labeling
A common architecture for sequence labeling tasks like Named Entity Recognition (NER) uses BERT. The input sequence is first tokenized, and special tokens like [CLS] and [SEP] are added to the beginning and end, respectively. This sequence is then passed through the BERT model to obtain a contextualized hidden state representation for each token. Finally, a classification layer is placed on top of each token's hidden state (excluding the special tokens) to predict its corresponding label from a predefined set, such as {B, I, O} for NER.

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Part-of-Speech (POS) Tagging
BERT-based Architecture for Sequence Labeling
Span Prediction in NLP
Definition of Named Entity Recognition
A model is designed to perform a sequence labeling task by identifying organizations and locations within a text. For each word (token), it must assign one of the following labels:
O(not an entity),B-ORG(beginning of an organization),I-ORG(inside an organization),B-LOC(beginning of a location), orI-LOC(inside a location). Given the sentence 'The United Nations headquarters in New York City is a major landmark', which of the following represents the correct sequence of labels?Applicability of Sequence Labeling
Analyzing a Sequence Labeling Model's Output
Negative Likelihood Loss in Sequence Labeling
Learn After
Illustration of BERT-based Architecture for Named Entity Recognition
Training BERT-based NER Models
BERT-based Architecture for Span Prediction
An engineer is using a pre-trained transformer model to build a system that assigns a grammatical tag (e.g., Noun, Verb, Adjective) to every word in a sentence. After the model processes the input and generates a final hidden state vector for each token, which of the following is the most appropriate architectural choice to generate the tag for each specific word?
A developer is building a model to assign a specific category (e.g., 'Person', 'Location', 'Organization') to each word in a sentence. The model's architecture involves using a large, pre-trained component to understand the context of each word. Arrange the following steps in the correct chronological order that describes how this model processes an input sentence to generate a label for each word.
An engineer is building a system to identify and tag specific medical terms (e.g., 'symptom', 'disease', 'medication') within clinical notes. They are using a large, pre-trained transformer-based model that processes an entire sentence and outputs a contextualized vector representation for each input token. Which of the following describes the most effective and standard final layer design for this token-level classification task?