Multiple Choice

An agent is in a state 'S' and follows a fixed policy. From this state, the environment is stochastic: there is a 50% chance the agent will enter a trajectory with a reward sequence of [+10, 0, 0, ...] and a 50% chance it will enter a different trajectory with a reward sequence of [0, +10, 0, ...]. Given the state-value formula V(s)=E[t=0γtrts0=s,π]V(s) = \mathbb{E}[\sum_{t=0}^{\infty} \gamma^t r_t | s_0 = s, \pi] and a discount factor (γ) of 0.9, what is the value of state 'S'?

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science