1Cademy - Basic Policy Gradient Approach

Learn Before

Training Objective as Maximization of the Performance Function

Activity (Process)

Basic Policy Gradient Approach

A basic policy gradient approach to reinforcement learning involves three main steps. First, sample a number of state-action sequences (trajectories) based on a given policy. Second, evaluate each sampled sequence using a performance function that measures the expected cumulative reward. Third, update the model parameters to maximize this performance function, typically by employing optimization algorithms such as gradient descent.

Updated 2026-05-01

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related