1Cademy - Predicted Probability Distribution in MLM

Learn Before

MLM Training Objective using Cross-Entropy Loss

Definition

Predicted Probability Distribution in MLM

Within the Masked Language Modeling (MLM) framework, the mathematical notation $\mathbf{p}_k^{\mathbf{W},\theta}$ represents the model's predicted probability distribution for a token at a given position $k$ . This distribution is calculated based on the corrupted input sequence $\bar{\mathbf{x}}$ and the model's trainable parameters $\mathbf{W}$ and $\theta$ . During training, this predicted distribution is evaluated against the true token's distribution to compute the log cross-entropy loss.