Formula

Notation for the RLHF Reward Model

The function of the reward model in RLHF is expressed as r=Reward(x,y)r = \text{Reward}(\mathbf{x}, y), where rr is the scalar reward, x\mathbf{x} is the input prompt, and yy is the generated output. The reward, rr, measures how well the output yy aligns with desired behavior for the input x\mathbf{x}. For notational simplicity, this function is often denoted as r(x,y)r(\mathbf{x}, y).

Image 0

0

1

Updated 2026-05-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences