Definition

Predicted Probability Distribution in MLM

Within the Masked Language Modeling (MLM) framework, the mathematical notation pkW,θ\mathbf{p}_k^{\mathbf{W},\theta} represents the model's predicted probability distribution for a token at a given position kk. This distribution is calculated based on the corrupted input sequence xˉ\bar{\mathbf{x}} and the model's trainable parameters W\mathbf{W} and θ\theta. During training, this predicted distribution is evaluated against the true token's distribution to compute the log cross-entropy loss.

Image 0

0

1

Updated 2026-04-15

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences