Example

Illustration of Transformer Encoding for Sequence Classification

The step-by-step processing of a sentence pair through a Transformer encoder can be visualized to understand sequence classification. Given a concatenated input sequence such as [CLS][\mathrm{CLS}] It is raining . [SEP][\mathrm{SEP}] I need an umbrella . [SEP][\mathrm{SEP}], the procedure unfolds as follows: First, each input token xix_i is mapped to its corresponding embedding vector ei\mathbf{e}_i. Next, the entire sequence of embeddings (e0,,e11\mathbf{e}_0, \dots, \mathbf{e}_{11}) is fed into the encoder. The encoder then generates a corresponding sequence of contextualized output vectors (h0,,h11\mathbf{h}_0, \dots, \mathbf{h}_{11}). Finally, because the initial hidden state h0\mathbf{h}_0 acts as the aggregate representation of the entire sequence, a Softmax classification layer is applied directly to it to yield a binary prediction, such as 'Is Next or Not?'.

0

1

Updated 2026-04-16

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences