Short Answer

Deconstructing a Shaped Reward Function

An AI agent is being trained to navigate a grid world to reach a goal square. The original reward function, rr, provides +10 for reaching the goal and -0.1 for every other step. To encourage faster learning, a new, transformed reward function, rr', is implemented. For a specific step that moves the agent from a square 5 units away from the goal to a square 4 units away, the agent receives a total transformed reward rr' of +0.9. Based on the general formula for reward shaping, r(st,at,st+1)=r(st,at,st+1)+f(st,at,st+1)r'(s_t, a_t, s_{t+1}) = r(s_t, a_t, s_{t+1}) + f(s_t, a_t, s_{t+1}), what are the numerical values for the original reward rr and the shaping function ff for this specific step? Explain the purpose of the shaping function in this context.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science