Example

Illustration of BERT for Text-Pair Tasks (Classification and Regression)

This node illustrates the general pipeline for applying BERT to text-pair tasks. Two texts are concatenated into a single input sequence, formatted as [CLS] Text 1 [SEP] Text 2 [SEP]. This sequence is converted to embeddings and processed by BERT to produce hidden states. The aggregate representation from the [CLS] token's hidden state, hclsh_{cls}, is then fed into a final prediction network. This network can be configured for different tasks, such as outputting a class label for classification or a real-valued score for regression.

The process for classification can be visualized as follows:

Input Tokens: [CLS] x1...xm [SEP] y1...yn [SEP] ↓ Embeddings: e_cls, e1, ..., e_len ↓ BERT ↓ Hidden States:h_cls, h1, ..., h_len ↓ (select h_cls) Prediction Network ↓ Class

Similarly, the process for regression to output a numerical score is:

Input Tokens: [CLS] x1...xm [SEP] y1...yn [SEP] ↓ Embeddings: e_cls, e1, ..., e_len ↓ BERT ↓ Hidden States:h_cls, h1, ..., h_len ↓ (select h_cls) Prediction Network ↓ Number (similarity, evaluation score, etc.)
Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related