Multiple Choice

An AI team is fine-tuning a language model to write compelling short stories. The model generates a story one token at a time. However, they find the model's outputs are becoming repetitive and nonsensical. Their current process involves having a reward model evaluate the entire 500-token story only after it is fully completed, providing a single quality score at the very end. Which of the following best explains why this training setup is failing?

0

1

Updated 2025-10-05

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science