1Cademy - Permuted Language Modeling (PLM)

Learn Before

Auto-Regressive (AR) Models
Masked Language Modeling (MLM)
Drawback of Masked Language Modeling: The [MASK] Token Discrepancy
Limitation of MLM: Ignoring Dependencies Between Masked Tokens

Concept

Permuted Language Modeling (PLM)

Permuted Language Modeling (PLM) is a pre-training approach designed to resolve specific issues found in Masked Language Modeling, such as the mismatch between training and inference and the independence assumption among masked tokens. While it is a sequential prediction task, the actual order of tokens in the original text remains completely unchanged. Instead, the model is trained to predict the tokens sequentially according to an arbitrarily determined, permuted order.

Updated 2026-05-02

Contributors are: