1Cademy - Limitation of MLM: Ignoring Dependencies Between Masked Tokens

Learn Before

Self-Supervised Pre-training of Encoders via Masked Language Modeling

Concept

Limitation of MLM: Ignoring Dependencies Between Masked Tokens

A key limitation of the auto-encoding objective in Masked Language Modeling (MLM) is its failure to account for dependencies among the masked tokens. The model is trained to predict each masked token independently of the others. For example, if two tokens $x_2$ and $x_6$ in a sequence are masked, the prediction for the first masked token ( $x_2$ ) is generated independently of the second masked token ( $x_6$ ), even though $x_6$ should ideally be considered within the context of $x_2$ .