True/False

Consider a model tasked with predicting a masked word within a complete sentence by looking at all surrounding words. The probability distribution calculated for this masked position has the same fundamental interpretation as the distribution from a model that generates a sentence one word at a time, where each new word is predicted based only on the words that came before it.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science