1Cademy - BERT's Masked Language Modeling Pre-training Pipeline

Learn Before

Masked Language Modeling (MLM) as a Pre-training Task

BERT's Masked Language Modeling Pre-training Pipeline

The pre-training pipeline for BERT's Masked Language Modeling (MLM) is a multi-step process. It begins with an input sequence, from which 15% of tokens are randomly selected. These chosen tokens are then altered: 80% are masked, 10% are replaced by random tokens, and 10% are left unchanged. This modified sequence is converted into embeddings and processed by a Transformer Encoder to produce contextualized hidden states. Finally, the model is trained using these hidden states to predict the original values of the altered tokens.

11 days ago

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related