Formula

Segment-Based Reward Score Formula

The reward score for the kk-th segment of a generated output is calculated using a reward model that takes the full context into account. The evaluation of a particular segment involves the original input prompt, the entire generated output sequence, and the specific segment itself. The formula is expressed as: rk=r(x,y,yˉk)r^k = r(\mathbf{x}, \mathbf{y}, \bar{\mathbf{y}}_k) where rkr^k denotes the reward for the kk-th segment, rr is the reward model function, x\mathbf{x} represents the initial prompt, y\mathbf{y} is the complete output sequence, and yˉk\bar{\mathbf{y}}_k is the segment under evaluation.

Image 0

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences