Concept

Classification on Sequence Representation

A classifier can be constructed on top of the sequence representation vector, denoted by hcls{}\mathbf{h}_{\mathrm{cls}} (or h0{}\mathbf{h}_0), which corresponds to the encoder's output for the initial [CLS]{}[\mathrm{CLS}] token. Using this framework, one can compute the conditional probability of a specific label c{}c based on the representation, mathematically expressed as Pr(chcls){}\Pr(c|\mathbf{h}_{\mathrm{cls}}). While many loss functions are available for such classification problems, maximum likelihood training often involves defining a specific loss, such as LossNSP{}\mathrm{Loss}_{\mathrm{NSP}} for Next Sentence Prediction tasks.

Image 0

0

1

Updated 2026-04-17

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related