1Cademy - A language model is trained using an objective where every token in the input sentence is replaced by a `[MASK]` token. The model is then required to reconstruct the entire original sentence. How does the primary skill developed by this training method differ from a method where only a small fraction (e.g., 15%) of the tokens are masked?

Learn Before

Example of Full Sequence Generation via 100% Masking

Multiple Choice

A language model is trained using an objective where every token in the input sentence is replaced by a [MASK] token. The model is then required to reconstruct the entire original sentence. How does the primary skill developed by this training method differ from a method where only a small fraction (e.g., 15%) of the tokens are masked?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related