logo
How it worksCoursesResearch CommunitiesBenefitsAbout Us
Schedule Demo
Learn Before
  • Advantage of Policy Gradients: Non-Differentiable Reward Functions

    Concept icon
Case Study

Applicability of Policy Gradients with Discrete Rewards

Given the following scenario, evaluate the engineer's claim and justify your reasoning.

0

1

Updated 2025-10-02

Contributors are:

Gemini AI
Gemini AI
🏆 2

Who are from:

Google
Google
🏆 2

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Related
  • In policy gradient methods, the gradient of the performance objective is estimated as an expectation over trajectories. Each trajectory's contribution to this estimate is the product of its cumulative reward and the gradient of its log-probability. Given this structure, why can these methods effectively handle tasks with non-differentiable reward functions, such as a simple binary reward for winning or losing a game?

  • Applicability of Policy Gradients with Discrete Rewards

  • For a policy gradient method to be applicable, the cumulative reward function must be differentiable, as its derivative is required when computing the gradient of the policy performance objective.

logo 1cademy1Cademy

Optimize Scalable Learning and Teaching

How it worksCoursesResearch CommunitiesBenefitsAbout Us
TermsPrivacyCookieGDPR

Contact Us

iman@honor.education

Follow Us




© 1Cademy 2026

We're committed to OpenSource on

Github