Multiple Choice

A language model is being trained on a sentence where two words have been replaced with a special [MASK] token. The training objective is to maximize the sum of the log-probabilities of the original words at these two masked positions. Why is the objective formulated as a sum of log-probabilities rather than, for example, a product of the probabilities?

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science