A research team is training a language model to solve multi-step physics problems. The model is trained on a dataset of problems and their final numerical answers. The training process provides a positive reward only if the model's final answer is correct. After extensive training, the model still struggles, often making logical errors in the intermediate steps of its reasoning. Which of the following best explains the fundamental flaw in this training approach?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Diagnosing a Flawed LLM Training Strategy
A research team is training a language model to solve multi-step physics problems. The model is trained on a dataset of problems and their final numerical answers. The training process provides a positive reward only if the model's final answer is correct. After extensive training, the model still struggles, often making logical errors in the intermediate steps of its reasoning. Which of the following best explains the fundamental flaw in this training approach?
Evaluating LLM Training Strategies for a Tutoring Application