Learn Before
Concept

State-Value and Action-Value Functions

In reinforcement learning, value functions are crucial for estimating the long-term desirability of states or actions. They quantify the expected return, which is the total accumulated reward an agent anticipates. The two main types are:

  1. State-Value Function (vπv_\pi): Also known as the value function, this assesses the expected discounted return (i.e., accumulated rewards) for an agent starting from a particular state 's' and following a specific policy 'π'. The expectation is performed over all possible trajectories originating from that state.

  2. Action-Value Function (qπq_\pi): Also known as the Q-value function, this measures the expected return if an agent begins in state 's', performs action 'a', and subsequently adheres to policy 'π'.

A key element in these calculations is the discount factor, γ\gamma (where $0 \le \gamma \le 1$), which adjusts the importance of future rewards.

Image 0

0

1

Updated 2026-05-02

Tags

Data Science

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences