Learn Before
Concept

Optimization Verification for Reinforcement Learning Reward Functions

For a reinforcement learning system whose trajectory is worse than a human pilot trajectory, compare the reward assigned to the human trajectory with the reward assigned to the algorithm trajectory. If the human trajectory scores higher, improving the reinforcement learning algorithm is worthwhile; if it does not, improve the reward function.

0

1

Updated 2026-05-25

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy