Short Answer

Calculating State Value in a Deterministic Environment

An agent operates in an environment with three states: A, B, and C. The agent starts in state A and follows a fixed, deterministic policy: from state A, it always moves to state B, and from state B, it always moves to state C. State C is a terminal state, ending the process.

The rewards for these transitions are as follows:

  • The transition from A to B yields a reward of +2.
  • The transition from B to C yields a reward of +10.

Using a discount factor (γ) of 0.9, calculate the value of the starting state A. Show your calculation based on the formula: V(s)=E[t=0γtrts0=s,π]V(s) = \mathbb{E}[\sum_{t=0}^{\infty} \gamma^t r_t | s_0 = s, \pi].

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science