1Cademy - Example of MLM Training Objective with Multiple Masks

Learn Before

MLM Training Objective using Cross-Entropy Loss

Example

Example of MLM Training Objective with Multiple Masks

To illustrate the Masked Language Modeling (MLM) training objective with multiple masked tokens, consider the original sequence the early bird catches the worm''. If the tokens early'' at position ${}2$ and ``worm'' at position ${}6$ are masked, the objective is to maximize the sum of log-scale probabilities for correctly predicting these two tokens. Given the corrupted input $\bar{\textbf{x}}$ , where $\bar{\textbf{x}} = \text{[CLS] The } \underbrace{\text{[MASK]}}_{\bar{x}_2} \text{ bird catches the } \underbrace{\text{[MASK]}}_{\bar{x}_6}$ , the loss function to maximize is: