Formula

Formula for a Positive Reward Function r(x, y, ȳ)

The function r(x,y,yˉ)=1r(\mathbf{x}, \mathbf{y}, \bar{\mathbf{y}}) = 1 assigns a constant positive value. This function's arguments are the vectors x\mathbf{x} and y\mathbf{y}, and the average vector yˉ\bar{\mathbf{y}}. In machine learning, a value of 1 often signifies a positive reward or indicates a positive sample.

Image 0

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related