Concept

Per-Token Classification for Encoder Training

A method for training Transformer encoders as classifiers involves applying a distinct supervision signal to the output corresponding to each token in a sequence. In this setup, the model learns by making a classification decision for every individual token, such as identifying if a token has been altered. This per-token objective, exemplified by the ELECTRA model, contrasts with approaches that generate a single classification for an entire sequence.

0

1

Updated 2026-04-16

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences