1Cademy - A language model is pre-trained using an objective where, for the input sentence The model learns from text, it might be tasked to predict the word learns based on the context of text and The, while the word model is not yet visible to it. In the next step, it might predict model based on text, The, and the newly predicted learns. What is the primary advantage of this training approach compared to a standard left-to-right sequential prediction?

Learn Before

Permuted Language Modeling (PLM)

Multiple Choice

A language model is pre-trained using an objective where, for the input sentence 'The model learns from text', it might be tasked to predict the word 'learns' based on the context of 'text' and 'The', while the word 'model' is not yet visible to it. In the next step, it might predict 'model' based on 'text', 'The', and the newly predicted 'learns'. What is the primary advantage of this training approach compared to a standard left-to-right sequential prediction?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related