Learn Before
Input Sequence Formatting for Span Prediction
To prepare the input for a span prediction model, the query and the context are combined into a single sequence. This is often done by concatenating the query tokens (e.g., xm), a special separator token [SEP], the context tokens (e.g., y1 y2 yn), and another final [SEP] token. This packed sequence, xm [SEP] y1 y2 yn [SEP], is then fed into the model.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.1 Pre-training - Foundations of Large Language Models
Related
Span Prediction Loss Function
Inference for Span Prediction
Illustration of BERT-based Architecture for Span Prediction
Input Sequence Formatting for Span Prediction
Applying Prediction Networks to Context Token Outputs
An engineer is designing a model to extract answers from a paragraph. The model must identify a continuous segment of text (a 'span') that answers a given question. The model's base component processes the input and produces a contextualized vector representation for each token in the paragraph. Considering the task is to identify the start and end points of the answer, which of the following architectural designs for the final prediction layer is most appropriate?
Debugging a Question-Answering Model Architecture
Comparing Model Architectures for Text Extraction Tasks
Learn After
A language model is designed to find an answer by identifying a specific segment of text (a 'span') within a larger context, based on a given query. If the query is 'What is the capital of France?' and the context is 'Paris is the capital and most populous city of France.', which of the following shows the correctly formatted single input sequence that should be fed into the model?
A language model designed for span-based question answering needs the query and the context document to be combined into a single input string. Arrange the following components into the correct structural order.
Rationale for Combined Input Sequence