Learn Before
Action-Value Function Formula
The action-value function, or Q-value function, denoted as , estimates the expected return when an agent starts from a specific state , immediately takes a particular action , and then adheres to a given policy for all subsequent decisions. It is formally defined as the expectation over all possible future trajectories:
Here, designates the starting state, and specifies the initial action taken. The parameter represents the discount factor applied to future rewards, and is the reward obtained at time step .

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Action-Value Function Formula
An agent is learning to play a game where the goal is to maximize its final score. At a specific point in the game (a 'state'), the agent is considering several possible moves ('actions'). A function is used to estimate the total expected future score that can be achieved by taking a specific action from the current state and then playing optimally for the rest of the game. What does the output of this function represent for a given state-action pair?
Utility of Action-Specific Value Estimation
Choosing the Right Tool for Decision-Making
Learn After
Advantage Function Definition
An agent is being trained in an environment where it must choose between two initial actions from the same starting position. Action A leads to a short sequence of steps resulting in a small, immediate reward. Action B leads to a much longer sequence of steps resulting in a large, delayed reward. According to the action-value function formula, which calculates the expected total discounted reward for taking an action in a state, how would decreasing the discount factor (γ) from a high value (e.g., 0.99) to a very low value (e.g., 0.1) most likely influence the agent's learned behavior?
Calculating Action-Values in a Simple Environment
Match each component of the action-value function formula, , with its correct description.