1Cademy - Illustration of BERT for Text-Pair Tasks (Classification and Regression)

Learn Before

Example

Illustration of BERT for Text-Pair Tasks (Classification and Regression)

This node illustrates the general pipeline for applying BERT to text-pair tasks. Two texts are concatenated into a single input sequence, formatted as [CLS] Text 1 [SEP] Text 2 [SEP]. This sequence is converted to embeddings and processed by BERT to produce hidden states. The aggregate representation from the [CLS] token's hidden state, $h_{cls}$ , is then fed into a final prediction network. This network can be configured for different tasks, such as outputting a class label for classification or a real-valued score for regression.

The process for classification can be visualized as follows:

Input Tokens: [CLS] x1...xm [SEP] y1...yn [SEP]
                  ↓
Embeddings:   e_cls, e1, ..., e_len
                  ↓
                  BERT
                  ↓
Hidden States:h_cls, h1, ..., h_len
                  ↓ (select h_cls)
                  Prediction Network
                  ↓
                  Class

Similarly, the process for regression to output a numerical score is:

Input Tokens: [CLS] x1...xm [SEP] y1...yn [SEP]
                  ↓
Embeddings:   e_cls, e1, ..., e_len
                  ↓
                  BERT
                  ↓
Hidden States:h_cls, h1, ..., h_len
                  ↓ (select h_cls)
                  Prediction Network
                  ↓
                  Number (similarity, evaluation score, etc.)

0

1

Updated 2026-05-02

Contributors are:

Who are from:

References

Learn Before

Related

Learn After