1Cademy - Training Objective as Maximization of the Performance Function

Learn Before

Objective Function as Expected Cumulative Reward (Performance Function)

Concept

Training Objective as Maximization of the Performance Function

In reinforcement learning, the primary goal of the training process is to find the optimal set of policy parameters, denoted by $\theta$ , that maximizes the objective or performance function, $J(\theta)$ . This optimization aims to enhance the policy in a way that yields the highest possible expected cumulative reward. Formally, the optimal parameters $\tilde{\theta}$ are determined by the equation: $\tilde{\theta} = \arg\max_{\theta} J(\theta)$

Updated 2026-05-01

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Optimal Policy Parameters via Maximization Formula
An engineer is training a system using a reinforcement learning approach. The system's behavior is determined by a set of adjustable parameters. The training process aims to find the parameter values that maximize a specific 'performance function,' which represents the expected cumulative reward. The engineer runs two separate training procedures, Procedure X and Procedure Y, and observes the following final outcomes:
- Procedure X: The final set of parameters results in a performance funct
Evaluating Policy Effectiveness
Identifying Optimal Policy Parameters from Training Data
Basic Policy Gradient Approach

Learn Before

Related

Learn After