Learn Before
  • General Formulation of a Sequence Model

Role of the [CLS] Token in Sequence Classification

The [CLS] token is a special symbol, denoted as x0x_0, that is prepended to an input sequence before it is processed by an encoder. While it serves as a start symbol at the input stage, its primary function is realized at the output. The encoder's final hidden state corresponding to this token, h0h_0, is typically used as the aggregate representation for the entire sequence. This single vector is then fed into a classification layer, such as Softmax, for sequence-level tasks.

0

1

6 months ago

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
  • Output Variation in Sequence Models

  • Role of the [CLS] Token in Sequence Classification

  • Masked Language Modeling

  • Input Formatting with Separator Tokens

  • Standard Auto-Regressive Probability Factorization using Embeddings

  • CLS Token as a Start Symbol in Encoder Pre-training

  • Comparison of Context Usage in Causal vs. Masked Language Modeling

  • Applying the General Sequence Model Formulation

  • In the general formulation of a sequence model, o = g(x_0, x_1, ..., x_m; θ), which statement best analyzes the distinct roles of the components?

  • Match each symbol from the general sequence model formulation, o = g(x_0, x_1, ..., x_m; θ), with its correct description.

  • Neural Network as a Parameterized Function g(·; θ)

  • Fundamental Issues in Sequence Model Formulation

Learn After
  • Sequence Classification Pipeline using the [CLS] Token Output

  • Evaluating a Sequence Representation Method

  • A machine learning engineer is building a model to classify sentences as either 'question' or 'statement'. They add a special classification token to the beginning of each input sentence before passing it to an encoder. The encoder then produces a final hidden state vector for every token in the input. For the final classification step, which hidden state vector should be used as the representative summary of the entire sentence?

  • Debugging a Sequence Classification Model

  • In a sequence classification task, the special token prepended to the input is designed so that its initial vector representation, before being processed by the main model, contains a summary of the entire sequence's meaning.