Multiple Choice

A research team is training a language model to solve multi-step physics problems. The model is trained on a dataset of problems and their final numerical answers. The training process provides a positive reward only if the model's final answer is correct. After extensive training, the model still struggles, often making logical errors in the intermediate steps of its reasoning. Which of the following best explains the fundamental flaw in this training approach?

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science