1Cademy - MLM Training Objective using Cross-Entropy Loss

Learn Before

Formula

MLM Training Objective using Cross-Entropy Loss

The training objective for Masked Language Modeling (MLM) involves finding the optimal model parameters, $\widehat{\mathbf{W}}$ and $\hat{\theta}$ , that minimize the total cross-entropy loss over a given dataset $\mathcal{D}$ . For each modified text sequence $\bar{\mathbf{x}}$ , the loss is computed only for the set of selected positions $\mathcal{A}(\mathbf{x})$ by comparing the model's predicted probability distribution $\mathbf{p}_{i}^{\mathbf{W},\theta}$ with the ground-truth distribution $\mathbf{p}_{i}^{\mathrm{gold}}$ at each selected position $i$ . The complete optimization objective is formulated as: $(\widehat{\mathbf{W}},\hat{\theta}) = \arg\min_{\mathbf{W},\theta} \sum_{\mathbf{x} \in \mathcal{D}} \sum_{i \in \mathcal{A}(\mathbf{x})} \mathrm{LogCrossEntropy}(\mathbf{p}_{i}^{\mathbf{W},\theta},\mathbf{p}_{i}^{\mathrm{gold}})$

Updated 2026-05-02

Contributors are:

Who are from:

References

Learn Before

Related

Learn After