Short Answer

Reward Signal Transformation in a Sequential Task

A robot arm is trained to stack three blocks in a specific order. It only receives a reward of +10 after placing the third and final block, and only if the entire stack is correct. For all intermediate actions (placing the first and second blocks), the reward is zero. Describe how this single, final reward can be transformed into a dense supervision signal for each of the three actions. Explain why this transformation helps the robot learn more effectively.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science