Learn Before
Comparing Reward Structures in AI Training
Consider two scenarios for training an AI agent. In Scenario A, an agent learns to navigate a maze and receives a small positive reward for every step that brings it closer to the exit and a small negative reward for hitting a wall. In Scenario B, an agent learns to write a short story and only receives a reward after the entire story is written, based on its overall quality. Compare the reward structures in these two scenarios. Identify which scenario uses a dense reward structure and which uses a sparse one, and analyze the primary training challenge associated with the sparse reward scenario.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An AI agent is being trained to play a complex board game. The agent only receives a reward signal at the very end of the game: +1 for a win, -1 for a loss, and 0 for a draw. No feedback is given for any of the individual moves made during the game. Which of the following best describes this reward structure and its primary challenge for training the agent?
Reward System Design for a Summarization Agent
Comparing Reward Structures in AI Training