1Cademy - BERT-based Architecture for Span Prediction

Learn Before

Concept

BERT-based Architecture for Span Prediction

A common architecture for span prediction tasks utilizes BERT by concatenating the query and context text into a single input sequence. To identify the optimal answer span, two distinct prediction networks are placed on top of BERT's output layer. For each token $y_j$ within the context text, the first network generates the probability that it marks the start of the answer span (denoted by $p_j^{\mathrm{beg}}$ ), while the second network calculates the probability that it represents the end of the span (denoted by $p_j^{\mathrm{end}}$ ). These prediction networks are exclusively applied to the outputs corresponding to the context text.

Updated 2026-04-18

Contributors are:

Who are from:

References

Learn Before

Related

Learn After