1Cademy - Formula for a Positive Reward Function $$r(\mathbf{x}, \mathbf{y}, \bar{\mathbf{y}})$$

Learn Before

Average Value Notation ( $\bar{y}$ )
Examples of Constant Segment-Based Reward Functions

Formula

Formula for a Positive Reward Function $r(\mathbf{x}, \mathbf{y}, \bar{\mathbf{y}})$

The function $r(\mathbf{x}, \mathbf{y}, \bar{\mathbf{y}}) = 1$ assigns a constant positive value. This function's arguments are the vectors $\mathbf{x}$ and $\mathbf{y}$ , and the average vector $\bar{\mathbf{y}}$ . In machine learning, a value of 1 often signifies a positive reward or indicates a positive sample.

Updated 2026-07-01

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn After

A machine learning model uses the reward function r(x, y, ȳ) = 1 to evaluate data segments, where x, y, and ȳ are vectors representing different aspects of the data. If the model processes a segment where x = [0.1, 0.9], y = [1, 0], and ȳ = [0.6, 0.4], what is the reward value assigned to this segment?
For the reward function defined as $r(\mathbf{x}, \mathbf{y}, \bar{\mathbf{y}}) = 1$ , the output value is dependent on the specific values of the input vectors $\mathbf{x}$ and $\mathbf{y}$ .
Evaluating a Constant Reward Function

Learn Before

Related

Learn After