Short Answer

Impact of Training Data on Probability

A language model has been pre-trained exclusively on a large corpus of advanced physics research papers. When given the incomplete sentence, 'The best part of waking up is...', the model assigns a very low probability to the token 'coffee' and a much higher probability to the token 'data'. Explain why the model produces this result, based on how it computes probabilities.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science