1Cademy - Illustration of BERT-based Architecture for Span Prediction

Learn Before

BERT-based Architecture for Span Prediction

Example

Illustration of BERT-based Architecture for Span Prediction

A BERT-based model for span prediction processes a concatenated query and context text to identify an answer span. The input sequence, formatted with [CLS] and [SEP] tokens, is passed to BERT to generate contextual hidden states ( $h_i$ ) for each token. For each token in the context, its hidden state is fed into two separate prediction heads. One head calculates the probability that the token marks the beginning of the answer span ( $p_i^{\text{beg}}$ ), while the other calculates the probability that it marks the end ( $p_i^{\text{end}}$ ). This process is visualized in the accompanying diagram, which shows the flow from input tokens to the final begin/end probability predictions.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related

Learn After