Example of Outcome-Based Reward for a Mathematical Task
An example of an effective application for outcome-based rewards is in tasks involving mathematical calculations. In this scenario, a reward model can be trained to provide a positive reward if the model's final answer is correct and a negative reward if it is incorrect, as the correctness can be easily verified from the final output alone.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Aspect-Based Sentiment Analysis as an Example of Granular Evaluation
Segment-Based Reward Computation
Importance of Step-by-Step Supervision for Complex LLM Reasoning Tasks
Debugging Common C Syntax Errors: A 'Hello, World!' Example
Example of Outcome-Based Reward for a Mathematical Task
A research team is fine-tuning a language model on two different tasks. For which of the following tasks would a reward system that only provides a single score based on the final output's correctness be the least effective for identifying and correcting errors in the model's generation process?
LLMs for Textual Error Correction
Diagnosing a Flawed LLM Training Strategy
Critique of a Training Method for a Story-Writing AI
Aspect-Based Sentiment Analysis (ABSA)
Process-Based Supervision for Complex Reasoning
Learn After
A team of engineers is developing a system to automatically provide feedback to a language model during its training phase. They decide to use a simple reward mechanism: a positive signal is given if the model's final output is correct, and a negative signal is given if it is incorrect. For which of the following tasks would this reward mechanism be the most effective and least ambiguous?
Reward Strategy for a Mathematical AI
Evaluating a Reward Strategy for an AI Math Tutor