1Cademy - Classification on Sequence Representation

Learn Before

Next Sentence Prediction (NSP)
Role of the [CLS] Token in Sequence Classification

Concept

Classification on Sequence Representation

A classifier can be constructed on top of the sequence representation vector, denoted by ${}\mathbf{h}_{\mathrm{cls}}$ (or ${}\mathbf{h}_0$ ), which corresponds to the encoder's output for the initial ${}[\mathrm{CLS}]$ token. Using this framework, one can compute the conditional probability of a specific label ${}c$ based on the representation, mathematically expressed as ${}\Pr(c|\mathbf{h}_{\mathrm{cls}})$ . While many loss functions are available for such classification problems, maximum likelihood training often involves defining a specific loss, such as ${}\mathrm{Loss}_{\mathrm{NSP}}$ for Next Sentence Prediction tasks.