Concept

BERT-based Architecture for Span Prediction

A common architecture for span prediction tasks utilizes BERT by concatenating the query and context text into a single input sequence. To identify the optimal answer span, two distinct prediction networks are placed on top of BERT's output layer. For each token yjy_j within the context text, the first network generates the probability that it marks the start of the answer span (denoted by pjbegp_j^{\mathrm{beg}}), while the second network calculates the probability that it represents the end of the span (denoted by pjendp_j^{\mathrm{end}}). These prediction networks are exclusively applied to the outputs corresponding to the context text.

0

1

Updated 2026-04-18

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.1 Pre-training - Foundations of Large Language Models

Related