1Cademy - Optimal Policy Parameters via Maximization Formula

Learn Before

Training Objective as Maximization of the Performance Function

Formula

Optimal Policy Parameters via Maximization Formula

The optimal policy parameters, denoted by $\tilde{\theta}$ , are identified as the set of parameters that maximize the objective or performance function $J(\theta)$ . This optimization problem is formally expressed using the arg max operator: $\tilde{\theta} = \underset{\theta}{\arg\max} , J(\theta)$ This equation signifies a search for the argument (the specific value of $\theta$ ) that yields the maximum possible value for the function $J(\theta)$ .

Updated 2025-10-08

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn After

An agent's performance is evaluated using a function $J(\theta)$ , which depends on a set of parameters $\theta$ . The goal is to find the optimal parameters, $\tilde{\theta}$ , that maximize this function. The table below shows the performance values for four different sets of parameters. Which of the following represents the optimal parameters, $\tilde{\theta}$ ?

Parameter Set ( $\theta$ ) Performance ( $J(\theta)$ )
A 150
B 210
C 285
D 240
A reinforcement learning agent's policy is defined by a set of parameters, $\theta$ . After extensive training, it is determined that the performance function, $J(\theta)$ , reaches its peak value of 450 when the parameters are set to $\theta = \{'learning\_rate': 0.01, 'discount\_factor': 0.95\}$ . According to the optimization objective $\tilde{\theta} = \underset{\theta}{\arg\max} , J(\theta)$ , what does $\tilde{\theta}$ represent in this scenario?
Distinguishing Maximum Value from Optimal Argument

Parameter Set ( $\theta$ )	Performance ( $J(\theta)$ )
A	150
B	210
C	285
D	240

Learn Before

Related

Learn After