1Cademy - A language model is being trained on a sentence where two words have been replaced with a special [MASK] token. The training objective is to maximize the sum of the log-probabilities of the original words at these two masked positions. Why is the objective formulated as a sum of log-probabilities rather than, for example, a product of the probabilities?

Learn Before

Example of MLM Training Objective with Multiple Masks

Multiple Choice

A language model is being trained on a sentence where two words have been replaced with a special [MASK] token. The training objective is to maximize the sum of the log-probabilities of the original words at these two masked positions. Why is the objective formulated as a sum of log-probabilities rather than, for example, a product of the probabilities?

Updated 2025-10-04

Contributors are:

Who are from:

Learn Before

Related