Learn Before
Example of an Outcome-Based Reward Model in Mathematics
A practical application of an outcome-based reward model is in evaluating mathematical calculations. In this scenario, the model provides positive feedback for a correct final answer and negative feedback for an incorrect one, without assessing the intermediate steps of the calculation.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Example of an Outcome-Based Reward Model in Mathematics
Insufficiency of Outcome-Based Rewards for Complex Reasoning
A company is training a language model to act as an automated assistant for processing loan applications. The model must follow a specific, legally-mandated, multi-step procedure to ensure fairness and compliance (e.g., checking credit history, verifying income, providing specific disclosures). The company decides to train the model using a system that provides a large positive reward only if the final loan decision (approve/deny) is correct based on the applicant's overall profile. What is the most significant weakness of this training strategy?
Evaluating Reward Model Suitability
Reward Model Suitability for a Creative Task
Learn After
A language model is being trained to solve math problems. The training process uses a reward system that provides feedback based only on whether the final numerical answer is correct or incorrect. The model is given the problem
(5 * 4) + (10 / 2)and produces the following reasoning:Step 1: 5 * 4 = 20Step 2: 10 / 2 = 4Step 3: 20 + 4 = 24Final Answer: 24How would this reward system evaluate the model's entire response?
Evaluating a Reward Mechanism for a Financial AI
Evaluating a Flawed Mathematical Reasoning Process