Definition

Probability of a True Token in MLM

In Masked Language Modeling (MLM), the mathematical expression PrkW,θ(xkxˉ)\mathrm{Pr}_k^{\mathbf{W},\theta}(x_k|\bar{\mathbf{x}}) denotes the probability of predicting the correct, true token xkx_k at a specific position kk, given the corrupted input sequence xˉ\bar{\mathbf{x}}. This conditional probability depends on the model's learned parameters, represented by the weights W\mathbf{W} and θ\theta.

Image 0

0

1

Updated 2026-04-15

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences