Multiple Choice

A team is developing a model to solve complex logic puzzles. Their improvement strategy involves having the model generate multiple potential solutions for each puzzle. They then use an automated system to check if the final answer for each solution is correct. All solutions that yield the correct final answer are collected and used to further train the model. After several cycles, they are surprised to find the model's underlying problem-solving process has not reliably improved. Which of the following best explains the critical flaw in their training loop?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science