Multiple Choice

A language model is being fine-tuned using a specific process: for any given input prompt, the model generates two responses. A separate, pre-trained 'reward' system then scores both responses, and the language model's parameters are adjusted to make it more likely to produce responses that receive a high score. After extensive fine-tuning with this method, developers notice the model has become very good at generating responses that are stylistically polished, highly confident, and persuasive, but are often factually incorrect. What is the most likely cause of this outcome, based on the mechanics of the described fine-tuning objective?

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science