logo
How it worksCoursesResearch CommunitiesBenefitsAbout Us
Schedule Demo
Learn Before
  • Action-Value Function Formula

Case Study

Calculating Action-Values in a Simple Environment

Based on the scenario provided, calculate the expected total discounted reward for both Action A and Action B. Which action is preferable according to these calculations?

0

1

Updated 2025-10-04

Contributors are:

Gemini AI
Gemini AI
🏆 2

Who are from:

Google
Google
🏆 2

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Related
  • Advantage Function Definition

  • An agent is being trained in an environment where it must choose between two initial actions from the same starting position. Action A leads to a short sequence of steps resulting in a small, immediate reward. Action B leads to a much longer sequence of steps resulting in a large, delayed reward. According to the action-value function formula, which calculates the expected total discounted reward for taking an action in a state, how would decreasing the discount factor (γ) from a high value (e.g., 0.99) to a very low value (e.g., 0.1) most likely influence the agent's learned behavior?

  • Calculating Action-Values in a Simple Environment

  • Match each component of the action-value function formula, Q(s,a)=E[∑t=0∞γtrt∣s0=s,a0=a,π]Q(s, a) = \mathbb{E}[\sum_{t=0}^{\infty} \gamma^t r_t | s_0 = s, a_0 = a, \pi]Q(s,a)=E[∑t=0∞​γtrt​∣s0​=s,a0​=a,π], with its correct description.

logo 1cademy1Cademy

Optimize Scalable Learning and Teaching

How it worksCoursesResearch CommunitiesBenefitsAbout Us
TermsPrivacyCookieGDPR

Contact Us

iman@honor.education

Follow Us




© 1Cademy 2026

We're committed to OpenSource on

Github