Formula

MLM Loss Function as Negative Log-Likelihood

The loss function for Masked Language Modeling (MLM) calculates the negative log-likelihood of correctly predicting the original tokens at their masked positions. Given a token sequence x\mathbf{x} with a set of selected positions A(x)\mathcal{A}(\mathbf{x}), and its modified version xˉ\bar{\mathbf{x}}, the MLM loss is formulated as:

LossMLM=iA(x)logPri(xixˉ)\mathrm{Loss}_{\mathrm{MLM}} = - \sum_{i \in \mathcal{A}(\mathbf{x})} \log \mathrm{Pr}_i(x_i|\bar{\mathbf{x}})

In this equation, Pri(xixˉ)\mathrm{Pr}_i(x_i|\bar{\mathbf{x}}) represents the probability of accurately predicting the original token xix_i at the position ii given the modified input sequence xˉ\bar{\mathbf{x}}.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related