Activity (Process)

Training the Value Function with a Reward Model

In reinforcement learning, the training of the value function is fundamentally dependent on a reward model. This model provides the essential reward signal, rtr_t, which serves as the basis for computing the value function's learning target, for instance, within the Temporal Difference (TD) error calculation.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences