Concept

Sequence-Level Evaluation in Reward Models

When an RLHF reward model evaluates the relationship between an input prompt x\mathbf{x} and a complete output sequence y=y1...yn\mathbf{y} = y_1...y_n, it focuses on full semantic content rather than token-level accuracy. At each intermediate position tt in the output sequence, the model assigns a default value of 0{}0 (or another predetermined value). The actual scalar reward score r(x,y)r(\mathbf{x},\mathbf{y}) is only generated at the final position (t=nt = n), reflecting the quality of the completed text.

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences