Learn Before
Multiple Choice

An AI development team is refining a pre-trained language model using a dataset of human preferences, where each example consists of a prompt, a preferred response, and a rejected response. As training progresses, they notice that while the model is learning to generate responses that align with the preferences, its general language quality is deteriorating; it produces more repetitive and nonsensical text. What is the most probable cause of this issue related to the optimization objective's design?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science