1Cademy - A language model trained exclusively for next-token prediction (i.e., predicting a word based only on the words that precede it) can be framed as a specific implementation of a masked language model where, for every prediction, all subsequent tokens in the sequence are systematically masked.

Learn Before

Causal Language Modeling as a Special Case of Masked Language Modeling

True/False

A language model trained exclusively for next-token prediction (i.e., predicting a word based only on the words that precede it) can be framed as a specific implementation of a masked language model where, for every prediction, all subsequent tokens in the sequence are systematically masked.

Updated 2025-10-06

Contributors are: