Multiple Choice

A language model is being trained to predict the next word in a sequence. For a given training example, the function L(pθ,pgold)\mathcal{L}(\mathbf{p}^{\theta}, \mathbf{p}^{\text{gold}}) is used to measure the difference between the model's predicted probability distribution over the vocabulary (pθ\mathbf{p}^{\theta}) and the true distribution (pgold\mathbf{p}^{\text{gold}}), where the true next word has a probability of 1. If the calculated value of L\mathcal{L} is very high for this example, what does this most accurately indicate?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science