Activity (Process)

Joint Training in Replaced Token Detection

In the Replaced Token Detection framework, the generator and discriminator are trained simultaneously. The generator is trained as a masked language model, optimizing for maximum likelihood estimation to predict original tokens. Concurrently, the discriminator is trained as a classifier, optimizing a classification-based loss to identify which tokens were replaced by the generator. In models like ELECTRA, these two loss functions are combined to facilitate this joint training process.

0

1

Updated 2026-04-16

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences