Case Study

Selecting a Reward Model for a Math Tutoring LLM

Your team is developing an LLM to act as a math tutor. The primary goal is for the LLM to generate solutions to word problems that are not only correct but also demonstrate a clear and valid step-by-step reasoning process for students to learn from. The team is debating how to design the reward model for reinforcement learning. Which of the two approaches described below would be more effective for achieving the project's primary goal, and why? Justify your choice by explaining the potential pitfalls of the rejected approach.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science