Concept

Rule-Based Reward Models for Reasoning

In some applications of reinforcement learning for LLM reasoning, a reward model can be developed based on simple, predefined rules rather than being learned from data. An example of such a rule is providing a bonus or higher reward for longer, more detailed outputs to encourage the model to generate more elaborate reasoning paths.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course