1Cademy - Rule-Based Reward Models for Reasoning

Learn Before

Classification of Reward Models for LLM Reasoning

Concept

Rule-Based Reward Models for Reasoning

In some applications of reinforcement learning for LLM reasoning, a reward model can be developed based on simple, predefined rules rather than being learned from data. An example of such a rule is providing a bonus or higher reward for longer, more detailed outputs to encourage the model to generate more elaborate reasoning paths.

Updated 2026-05-06

Contributors are: