Sequence Classification Pipeline using the [CLS] Token Output
For sequence-level classification tasks, a standard pipeline is often employed. An input sequence, prepared with a special [CLS] token at the beginning, is first processed by a Transformer encoder. This yields a sequence of hidden state vectors, {h_0, ..., h_m}. The hidden state corresponding to the [CLS] token, , is then isolated, as it serves as an aggregate representation of the entire sequence's meaning. Finally, this single vector is passed through a classification layer, such as Softmax, to produce the final output, for instance, in a binary classification system.

0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Sequence Classification Pipeline using the [CLS] Token Output
Evaluating a Sequence Representation Method
A machine learning engineer is building a model to classify sentences as either 'question' or 'statement'. They add a special classification token to the beginning of each input sentence before passing it to an encoder. The encoder then produces a final hidden state vector for every token in the input. For the final classification step, which hidden state vector should be used as the representative summary of the entire sentence?
Debugging a Sequence Classification Model
In a sequence classification task, the special token prepended to the input is designed so that its initial vector representation, before being processed by the main model, contains a summary of the entire sequence's meaning.
Classification on Sequence Representation
Learn After
An engineer is building a model to classify customer reviews as 'positive' or 'negative'. The input text is first prepared by adding a special classification token to the very beginning. This entire sequence is then processed by an encoder, which generates a final output vector for each token. To make the final classification decision for the entire review, which specific vector should be passed to the final classification layer?
A data scientist is using a pre-trained Transformer model for a sentiment analysis task (classifying text as positive or negative). Arrange the following steps in the correct order to form the classification pipeline.
Diagnosing a Flawed Classification Pipeline