Short Answer

Reward Strategy for a Mathematical AI

You are training a language model to solve basic arithmetic problems like 'What is 135 + 258?'. You decide to implement a reward system that gives a positive score only if the model's final numerical answer is correct, and a negative score otherwise. Analyze why this simple, outcome-focused reward strategy is an effective approach for this specific task.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science