1Cademy - For a policy gradient method to be applicable, the cumulative reward function must be differentiable, as its derivative is required when computing the gradient of the policy performance objective.

Learn Before

Advantage of Policy Gradients: Non-Differentiable Reward Functions

True/False

For a policy gradient method to be applicable, the cumulative reward function must be differentiable, as its derivative is required when computing the gradient of the policy performance objective.

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

In policy gradient methods, the gradient of the performance objective is estimated as an expectation over trajectories. Each trajectory's contribution to this estimate is the product of its cumulative reward and the gradient of its log-probability. Given this structure, why can these methods effectively handle tasks with non-differentiable reward functions, such as a simple binary reward for winning or losing a game?
Applicability of Policy Gradients with Discrete Rewards
For a policy gradient method to be applicable, the cumulative reward function must be differentiable, as its derivative is required when computing the gradient of the policy performance objective.

Learn Before

Related