1Cademy - Probability of a True Token in MLM

Learn Before

MLM Training Objective using Cross-Entropy Loss

Definition

Probability of a True Token in MLM

In Masked Language Modeling (MLM), the mathematical expression $\mathrm{Pr}_k^{\mathbf{W},\theta}(x_k|\bar{\mathbf{x}})$ denotes the probability of predicting the correct, true token $x_k$ at a specific position $k$ , given the corrupted input sequence $\bar{\mathbf{x}}$ . This conditional probability depends on the model's learned parameters, represented by the weights $\mathbf{W}$ and $\theta$ .