Learn Before
Comparing Action Quality
An agent is in a specific situation where the average expected sum of future rewards is +2. The agent tries two different actions on separate occasions. After taking Action 1, the actual sum of future rewards it receives is +5. After taking Action 2, the actual sum of future rewards is -10. For which action is the 'advantage' value higher, and what does this value signify about that action's quality relative to the average expectation from that situation?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating an Action's Performance
An agent in a given state
stakes an actiona. The sequence of rewards it receives from that point until the end of the episode sums to a total of 10. The pre-calculated value for states, representing the average expected sum of future rewards from that state, is 15. Based on this information, what can be concluded about the actiona?Comparing Action Quality