Evaluating a Reward Mechanism for a Financial AI
Based on the following case study, identify the primary limitation of the reward mechanism being used and explain why this limitation is particularly problematic for a high-stakes task.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is being trained to solve math problems. The training process uses a reward system that provides feedback based only on whether the final numerical answer is correct or incorrect. The model is given the problem
(5 * 4) + (10 / 2)and produces the following reasoning:Step 1: 5 * 4 = 20Step 2: 10 / 2 = 4Step 3: 20 + 4 = 24Final Answer: 24How would this reward system evaluate the model's entire response?
Evaluating a Reward Mechanism for a Financial AI
Evaluating a Flawed Mathematical Reasoning Process