Role of the [CLS] Token in Sequence Classification
The [CLS] token is a special symbol, denoted as , that is prepended to an input sequence before it is processed by an encoder. While it serves as a start symbol at the input stage, its primary function is realized at the output. The encoder's final hidden state corresponding to this token, , is typically used as the aggregate representation for the entire sequence. This single vector is then fed into a classification layer, such as Softmax, for sequence-level tasks.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Output Variation in Sequence Models
Role of the [CLS] Token in Sequence Classification
Masked Language Modeling
Input Formatting with Separator Tokens
Standard Auto-Regressive Probability Factorization using Embeddings
CLS Token as a Start Symbol in Encoder Pre-training
Comparison of Context Usage in Causal vs. Masked Language Modeling
Applying the General Sequence Model Formulation
In the general formulation of a sequence model,
o = g(x_0, x_1, ..., x_m; θ), which statement best analyzes the distinct roles of the components?Match each symbol from the general sequence model formulation,
o = g(x_0, x_1, ..., x_m; θ), with its correct description.Neural Network as a Parameterized Function
g(·; θ)Fundamental Issues in Sequence Model Formulation