Example

Schematic Example of a Sentence-Pair Classification Pipeline

The pipeline for a sentence-pair classification task, such as determining if one sentence follows another, is illustrated by the following schematic. The process begins with an input sequence formatted with special tokens, such as [CLS] It is raining . [SEP] I need an umbrella . [SEP]. Here, [CLS] serves as the start symbol (x0x_0), and [SEP] separates the two sentences. Following the standard Transformer encoding procedure, each token is first converted into a corresponding embedding (eie_i). This sequence of embeddings, {e_0, ..., e_m}, is then fed into an encoder, which produces a sequence of contextualized output vectors, {h_0, ..., h_m}. Because the hidden state h0h_0 is generally used to represent the entire sequence, a Softmax layer is placed on top of it to perform the final binary classification.

token: [CLS] It is raining . [SEP] I need an umbrella . [SEP] embedding: e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 e10 e11 ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ Encoder ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ encoding: h0 h1 h2 h3 h4 e5 h6 h7 h8 h9 h10 h11 ↓ Softmax ↓ Is Next or Not?
Image 0

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences