Formula

Aggregated Reward as the Sum of Segment-Based Rewards

The total reward for a given input \mathbf{x} and a generated sequence \mathbf{y}, denoted as r(\mathbf{x}, \mathbf{y}), can be calculated by summing the individual rewards of its n constituent segments. This aggregation method is defined by the formula: r(x,y)=k=1nr(x,y,yˉk)r(\mathbf{x}, \mathbf{y}) = \sum_{k=1}^{n} r(\mathbf{x}, \mathbf{y}, \bar{\mathbf{y}}_k) Here, r(\mathbf{x}, \mathbf{y}, \bar{\mathbf{y}}_k) represents the reward function for the k-th segment. This segment-level reward can depend on the initial input, the entire output sequence, and an average value \bar{\mathbf{y}}_k associated with that specific segment.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences