Learn Before
Case Study

Evaluating Actions with a State-Value Baseline

In a reinforcement learning scenario, an agent uses the expected future reward from its current state as a baseline to evaluate its actions. This baseline helps determine if an action was better or worse than average for that specific state. Based on the case study below, calculate the resulting value used for the policy update for both Trajectory A and Trajectory B, and explain what each value signifies about the action taken.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science