Example

Illustration of BERT-based Architecture for Span Prediction

A BERT-based model for span prediction processes a concatenated query and context text to identify an answer span. The input sequence, formatted with [CLS] and [SEP] tokens, is passed to BERT to generate contextual hidden states (hih_i) for each token. For each token in the context, its hidden state is fed into two separate prediction heads. One head calculates the probability that the token marks the beginning of the answer span (pibegp_i^{\text{beg}}), while the other calculates the probability that it marks the end (piendp_i^{\text{end}}). This process is visualized in the accompanying diagram, which shows the flow from input tokens to the final begin/end probability predictions.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences