1Cademy - Joint Training in Replaced Token Detection

Learn Before

Replaced Token Detection as a Self-Supervised Task

Activity (Process)

Joint Training in Replaced Token Detection

In the Replaced Token Detection framework, the generator and discriminator are trained simultaneously. The generator is trained as a masked language model, optimizing for maximum likelihood estimation to predict original tokens. Concurrently, the discriminator is trained as a classifier, optimizing a classification-based loss to identify which tokens were replaced by the generator. In models like ELECTRA, these two loss functions are combined to facilitate this joint training process.

Updated 2026-04-16

Contributors are: