Learn Before
Analyzing Reward Function Invariance
Consider two reward functions, r and r', related by the equation r'(s_t, a_t, s_{t+1}) = r(s_t, a_t, s_{t+1}) + f(s_t, a_t, s_{t+1}). Explain why it is possible for an agent to learn the exact same optimal behavior under both r and r', even when the function f is not always zero. What does this phenomenon reveal about the challenge of inferring a single, true reward function from observing an agent's actions?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A research team is training an agent and finds that two different reward functions,
r_1andr_2, lead to the agent learning the exact same optimal behavior. The relationship between the two functions is defined asr_2(s_t, a_t, s_{t+1}) = r_1(s_t, a_t, s_{t+1}) + f(s_t, a_t, s_{t+1})for some non-zero functionf. What is the most accurate explanation for this phenomenon?Analyzing Reward Function Equivalence
Analyzing Reward Function Invariance