Concept

Outcome-Based Reward Models

Reward models can be designed to evaluate entire input-output sequences, a method that is highly effective for tasks where correctness can be verified by simply checking the final result. This approach focuses on the outcome rather than the process used to achieve it.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences