Learn Before
Formula

Action-Value Function Formula

The action-value function, or Q-value function, denoted as Q(s,a)Q(s,a), estimates the expected return when an agent starts from a specific state ss, immediately takes a particular action aa, and then adheres to a given policy π\pi for all subsequent decisions. It is formally defined as the expectation over all possible future trajectories:

Q(s,a)=E[t=0γtrt  s0=s,a0=a,π]Q(s,a) = \mathbb{E} \Big[ \sum_{t=0}^{\infty} \gamma^{t} r_t \ \big | \ s_0 = s, a_0 = a, \pi \Big]

Here, s0=ss_0 = s designates the starting state, and a0=aa_0 = a specifies the initial action taken. The parameter γ\gamma represents the discount factor applied to future rewards, and rtr_t is the reward obtained at time step tt.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences