Learn Before
Advantage Function Formula
The Advantage Function, , measures the relative benefit of taking a specific action in a state compared to the expected value from that state onward. It is calculated by subtracting the state-value function, , which acts as a baseline (), from the sum of future rewards. The formula is:

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
An agent is in a state 'S' and must choose between two policies, Policy A and Policy B. The sequence of rewards the agent will receive after starting in state 'S' and following each policy is deterministic and known:
- Policy A Reward Sequence:
[+10, +1, +1, +1, ...] - Policy B Reward Sequence:
[+3, +3, +3, +3, ...]
Given the formula for the value of a state, , which of the following statements correctly analyzes the relationship between the discount factor
γand the value of state 'S' for each policy?- Policy A Reward Sequence:
Calculating State Value in a Deterministic Environment
Advantage Function Formula
Temporal Difference (TD) Error as an Advantage Function Estimator
An agent is in a state 'S' and follows a fixed policy. From this state, the environment is stochastic: there is a 50% chance the agent will enter a trajectory with a reward sequence of [+10, 0, 0, ...] and a 50% chance it will enter a different trajectory with a reward sequence of [0, +10, 0, ...]. Given the state-value formula and a discount factor (γ) of 0.9, what is the value of state 'S'?
Learn After
Evaluating an Action's Performance
An agent in a given state
stakes an actiona. The sequence of rewards it receives from that point until the end of the episode sums to a total of 10. The pre-calculated value for states, representing the average expected sum of future rewards from that state, is 15. Based on this information, what can be concluded about the actiona?Comparing Action Quality