Learn Before
Reward System Design for a Summarization Agent
An engineer is training an AI agent to generate one-sentence product summaries. They are considering two different methods for providing feedback to the agent during training. Evaluate the two approaches described in the case study. Which approach is likely to result in faster initial training, and what is a significant potential drawback of that same approach?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An AI agent is being trained to play a complex board game. The agent only receives a reward signal at the very end of the game: +1 for a win, -1 for a loss, and 0 for a draw. No feedback is given for any of the individual moves made during the game. Which of the following best describes this reward structure and its primary challenge for training the agent?
Reward System Design for a Summarization Agent
Comparing Reward Structures in AI Training