Google

Reinforcement learning feedback can be categorized based on its frequency. Dense rewards are provided immediately and frequently, which generally makes policy training easier and more efficient. In contrast, sparse rewards are given only upon task completion. While dense feedback is often preferred, many scenarios, particularly in NLP, are inherently structured with sparse rewards.

Dense vs. Sparse Rewards

An AI agent is being trained to play a complex board game. The agent only receives a reward signal at the very end of the game: +1 for a win, -1 for a loss, and 0 for a draw. No feedback is given for any of the individual moves made during the game. Which of the following best describes this reward structure and its primary challenge for training the agent?

An engineer is training an AI agent to generate one-sentence product summaries. They are considering two different methods for providing feedback to the agent during training. Evaluate the two approaches described in the case study. Which approach is likely to result in faster initial training, and what is a significant potential drawback of that same approach?

Reward System Design for a Summarization Agent

Consider two scenarios for training an AI agent. In Scenario A, an agent learns to navigate a maze and receives a small positive reward for every step that brings it closer to the exit and a small negative reward for hitting a wall. In Scenario B, an agent learns to write a short story and only receives a reward after the entire story is written, based on its overall quality. Compare the reward structures in these two scenarios. Identify which scenario uses a dense reward structure and which uses a sparse one, and analyze the primary training challenge associated with the sparse reward scenario.

Learn Before

Related