Learn Before
An AI agent is being trained to play a complex board game. The agent only receives a reward signal at the very end of the game: +1 for a win, -1 for a loss, and 0 for a draw. No feedback is given for any of the individual moves made during the game. Which of the following best describes this reward structure and its primary challenge for training the agent?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An AI agent is being trained to play a complex board game. The agent only receives a reward signal at the very end of the game: +1 for a win, -1 for a loss, and 0 for a draw. No feedback is given for any of the individual moves made during the game. Which of the following best describes this reward structure and its primary challenge for training the agent?
Reward System Design for a Summarization Agent
Comparing Reward Structures in AI Training