Formula

Unit Reward Function for Segments

A simplified reward function can be implemented where the reward for any given segment is a constant value of 1. This is formally expressed as r(x,y,yˉ)=1r(x, y, \bar{y}) = 1. In this model, the reward is independent of the prompt xx, the complete response yy, and the specific content of the segment yˉ\bar{y}.

0

1

Updated 2025-09-22

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences