Concept

Replaced Token Detection as a Self-Supervised Task

This self-supervised task, exemplified by the ELECTRA model, involves a two-part setup: a generator and a discriminator. The generator, a small masked language model, first corrupts an input sequence by replacing some tokens with plausible alternatives. The discriminator, which is the main Transformer encoder being trained, then processes this corrupted sequence. Its objective is to perform a per-token binary classification, determining whether each token is from the original input or a replacement from the generator. This approach provides a classification-based supervision signal for every token, leading to more sample-efficient pre-training.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences