Multiple Choice

An AI team is training a model to solve complex, multi-step mathematical word problems. They are considering two different methods for providing feedback during training:

Method 1: The model generates the entire step-by-step solution and the final answer. It only receives a positive reward if the final numerical answer is correct.

Method 2: The model generates the solution one step at a time. It receives a positive reward for each individual step that is logically correct and follows from the previous one, regardless of the final answer.

Which method is more likely to produce a model that can reliably solve new, unseen complex problems, and why?

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science