1Cademy - Illustration of Transformer Encoding for Sequence Classification

Learn Before

Example of Next Sentence Prediction (NSP) Input Formatting

Example

Illustration of Transformer Encoding for Sequence Classification

The step-by-step processing of a sentence pair through a Transformer encoder can be visualized to understand sequence classification. Given a concatenated input sequence such as $[\mathrm{CLS}]$ It is raining . $[\mathrm{SEP}]$ I need an umbrella . $[\mathrm{SEP}]$ , the procedure unfolds as follows: First, each input token $x_i$ is mapped to its corresponding embedding vector $\mathbf{e}_i$ . Next, the entire sequence of embeddings ( $\mathbf{e}_0, \dots, \mathbf{e}_{11}$ ) is fed into the encoder. The encoder then generates a corresponding sequence of contextualized output vectors ( $\mathbf{h}_0, \dots, \mathbf{h}_{11}$ ). Finally, because the initial hidden state $\mathbf{h}_0$ acts as the aggregate representation of the entire sequence, a Softmax classification layer is applied directly to it to yield a binary prediction, such as 'Is Next or Not?'.

0

1

Updated 2026-04-16

Contributors are:

Who are from:

References

Learn Before

Related