Short Answer

Comparing Action Quality

An agent is in a specific situation where the average expected sum of future rewards is +2. The agent tries two different actions on separate occasions. After taking Action 1, the actual sum of future rewards it receives is +5. After taking Action 2, the actual sum of future rewards is -10. For which action is the 'advantage' value higher, and what does this value signify about that action's quality relative to the average expectation from that situation?

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science