Multiple Choice

An agent is being trained in an environment where it must choose between two initial actions from the same starting position. Action A leads to a short sequence of steps resulting in a small, immediate reward. Action B leads to a much longer sequence of steps resulting in a large, delayed reward. According to the action-value function formula, which calculates the expected total discounted reward for taking an action in a state, how would decreasing the discount factor (γ) from a high value (e.g., 0.99) to a very low value (e.g., 0.1) most likely influence the agent's learned behavior?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science