Multiple Choice

A language model is being trained to solve math problems. The training process uses a reward system that provides feedback based only on whether the final numerical answer is correct or incorrect. The model is given the problem (5 * 4) + (10 / 2) and produces the following reasoning: Step 1: 5 * 4 = 20 Step 2: 10 / 2 = 4 Step 3: 20 + 4 = 24 Final Answer: 24

How would this reward system evaluate the model's entire response?

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science