Multiple Choice

An engineering team implements a system to improve a language model's output. For each user query, the system generates 10 candidate responses and then uses a highly accurate reward model to select the best one. Despite the high accuracy of the reward model, the team observes that the final selected response is rarely a significant improvement over any of the other 9 candidates. Which of the following is the most likely underlying cause for this lack of significant improvement?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science