Concept

Self-Supervised Pre-training of Encoders via Masked Language Modeling

In the pre-training phase, an encoder model is trained using a self-supervision objective like Masked Language Modeling. The process begins by converting a corrupted input sequence, where some tokens are masked, into a sequence of embeddings. This embedding sequence is then fed into the encoder, which generates contextual vector representations for all input tokens. Finally, these representations are passed to an output layer, such as a Softmax model, which is trained to reconstruct the original masked tokens.

0

1

Updated 2026-05-02

Tags

Deep Learning

Data Science

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Foundations of Large Language Models

Related