1Cademy - Comparison of Arbitrary Order Prediction and Masked Language Modeling

Learn Before

Comparison

Comparison of Arbitrary Order Prediction and Masked Language Modeling

Predicting tokens in an arbitrary, permuted order allows for generation to be conditioned on a broader context, sharing conceptual similarities with Masked Language Modeling (MLM). Rather than being limited to just the preceding tokens as in standard models, it enables the use of bidirectional context. For example, when generating token $x_3$ , the model might consider both its left-context (embeddings $\mathbf{e}_0, \mathbf{e}_1, \mathbf{e}_2$ ) and its right-context (embedding $\mathbf{e}_4$ ). Because these embeddings incorporate the positional information of their respective tokens ( $x_0, x_1, x_2, x_4$ ), the original sequence order is preserved. Consequently, this approach functions similarly to MLM: it is as if $x_3$ is masked out, and the model uses its surrounding tokens to predict it.

Updated 2026-05-02

Contributors are:

Who are from:

References

Learn Before

Related

Learn After