1Cademy - An autonomous agent completes a task over four time steps. The sequence of actions and resulting rewards is as follows: - Time t=1: Action `a_1` -> Reward `r_1 = 0` - Time t=2: Action `a_2` -> Reward `r_2 = 0` - Time t=3: Action `a_3` -> Reward `r_3 = -1` - Time t=4: Action `a_4` -> Reward `r_4 = +10` When evaluating the decision to take action `a_2` at time t=2, which rewards should be considered as being potentially influenced by this specific action?

Learn Before

Causality Constraint in Reinforcement Learning

Multiple Choice

An autonomous agent completes a task over four time steps. The sequence of actions and resulting rewards is as follows:

Time t=1: Action a_1 -> Reward r_1 = 0
Time t=2: Action a_2 -> Reward r_2 = 0
Time t=3: Action a_3 -> Reward r_3 = -1
Time t=4: Action a_4 -> Reward r_4 = +10

When evaluating the decision to take action a_2 at time t=2, which rewards should be considered as being potentially influenced by this specific action?

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Irrelevance of Past Rewards for Policy Gradient Calculation
An autonomous agent completes a task over four time steps. The sequence of actions and resulting rewards is as follows:
- Time t=1: Action a_1 -> Reward r_1 = 0
- Time t=2: Action a_2 -> Reward r_2 = 0
- Time t=3: Action a_3 -> Reward r_3 = -1
- Time t=4: Action a_4 -> Reward r_4 = +10
When evaluating the decision to take action a_2 at time t=2, which rewards should be considered as being potentially influenced by this specific action?
An agent is learning to play a video game. At time step t=5, the agent performs an action (e.g., jumping). According to the causality principle in this context, this specific action at t=5 can alter the reward that was already received at time step t=3.
Causality Principle in Policy Gradient Calculation
Debugging a Policy Update Calculation

Learn Before

Related