Learn Before
Calculating State Value in a Deterministic Environment
An agent operates in an environment with three states: A, B, and C. The agent starts in state A and follows a fixed, deterministic policy: from state A, it always moves to state B, and from state B, it always moves to state C. State C is a terminal state, ending the process.
The rewards for these transitions are as follows:
- The transition from A to B yields a reward of +2.
- The transition from B to C yields a reward of +10.
Using a discount factor (γ) of 0.9, calculate the value of the starting state A. Show your calculation based on the formula: .
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An agent is in a state 'S' and must choose between two policies, Policy A and Policy B. The sequence of rewards the agent will receive after starting in state 'S' and following each policy is deterministic and known:
- Policy A Reward Sequence:
[+10, +1, +1, +1, ...] - Policy B Reward Sequence:
[+3, +3, +3, +3, ...]
Given the formula for the value of a state, , which of the following statements correctly analyzes the relationship between the discount factor
γand the value of state 'S' for each policy?- Policy A Reward Sequence:
Calculating State Value in a Deterministic Environment
Advantage Function Formula
Temporal Difference (TD) Error as an Advantage Function Estimator
An agent is in a state 'S' and follows a fixed policy. From this state, the environment is stochastic: there is a 50% chance the agent will enter a trajectory with a reward sequence of [+10, 0, 0, ...] and a 50% chance it will enter a different trajectory with a reward sequence of [0, +10, 0, ...]. Given the state-value formula and a discount factor (γ) of 0.9, what is the value of state 'S'?