Learn Before
Short Answer

Analyzing a Reward System's Weakness

A language model is being trained to solve multi-step math word problems. The training system only checks if the model's final numerical answer is correct. If the final answer is right, the model gets a positive reward; otherwise, it gets a negative reward. Describe a significant potential weakness of this training approach regarding the model's actual problem-solving capabilities.

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science