1Cademy - The Generator in Replaced Token Detection

Learn Before

Replaced Token Detection as a Self-Supervised Task
Self-Supervised Pre-training of Encoders via Masked Language Modeling

Concept

The Generator in Replaced Token Detection

In the Replaced Token Detection framework, the generator is a small masked language model tasked with creating a corrupted version of the original input text. Its process involves two main steps: first, it randomly masks a subset of tokens in a sequence. Second, it is trained to predict the original tokens for these masked positions. The generator then outputs a new sequence where the masked tokens have been replaced by its predictions, which may or may not match the original tokens. This altered sequence is then passed to the discriminator.