1Cademy - BERT Input Format for Sentence Pairs

Learn Before

BERT Input Representation: Single and Paired Sentences
Text-Pair Classification

Concept

BERT Input Format for Sentence Pairs

When handling sentence pairs, BERT processes them as a unified sequence. This sequence begins with a [CLS] token, followed by the first sentence (denoted as $\mathrm{Sent}_{A}$ ), a separator token [SEP], the second sentence ( $\mathrm{Sent}_{B}$ ), and a concluding [SEP] token. As established in the original BERT paper, the [SEP] token explicitly marks the boundary between the two sentences. This general input representation is formally expressed as the sequence: [CLS] SentA [SEP] SentB [SEP].