Improving Learning for a Maze-Solving Agent
Based on the provided scenario, describe a specific technique to modify the reward signal distribution to accelerate the agent's learning. Explain how you would assign a reward value to each individual move within a successful path.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Improving Learning for a Maze-Solving Agent
An agent is learning to generate a five-sentence summary of a document. It only receives a final quality score (e.g., +0.9) after the entire summary is complete. To improve training, this single final score is used to create a learning signal for each of the five sentences generated. Which of the following options best analyzes how this transformation from a single score to multiple signals works?
Reward Signal Transformation in a Sequential Task