1Cademy - Sequence-Level Evaluation in Reward Models

Learn Before

Architecture and Function of the RLHF Reward Model

Concept

Sequence-Level Evaluation in Reward Models

When an RLHF reward model evaluates the relationship between an input prompt $\mathbf{x}$ and a complete output sequence $\mathbf{y} = y_1...y_n$ , it focuses on full semantic content rather than token-level accuracy. At each intermediate position $t$ in the output sequence, the model assigns a default value of ${}0$ (or another predetermined value). The actual scalar reward score $r(\mathbf{x},\mathbf{y})$ is only generated at the final position ( $t = n$ ), reflecting the quality of the completed text.

Updated 2026-05-01

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn Before

Related