1Cademy - Example of Masked Language Modeling Loss Calculation

Learn Before

MLM Training Objective as Maximum Likelihood Estimation

Example

Example of Masked Language Modeling Loss Calculation

To illustrate the objective of maximizing log-scale probabilities in Masked Language Modeling, consider the original sequence 'The early bird catches the worm' where two tokens are masked. The corrupted input is $\bar{\textbf{x}} = \text{[CLS] The } \underbrace{\text{[MASK]}}_{\bar{x}_2} \text{ bird catches the } \underbrace{\text{[MASK]}}_{\bar{x}_6}$ . The objective is to maximize the sum of the log-probabilities for predicting the true tokens 'early' ( $x_2$ ) and 'worm' ( $x_6$ ) given this corrupted input. This is formally expressed as: $\mathrm{Loss} = \log \Pr(x_2 = \textit{early}|\bar{\textbf{x}}=\text{[CLS] The } \underbrace{\text{[MASK]}}_{\bar{x}_2} \text{ bird catches the } \underbrace{\text{[MASK]}}_{\bar{x}_6}) + \log \Pr(x_6 = \textit{worm}|\bar{\textbf{x}}=\text{[CLS] The } \underbrace{\text{[MASK]}}_{\bar{x}_2} \text{ bird catches the } \underbrace{\text{[MASK]}}_{\bar{x}_6})$ .

0

1

Updated 2026-04-15

Contributors are:

Who are from:

References

Learn Before

Related