Short Answer

Calculating Episode Return

An agent navigates a maze and completes an episode. During this episode, it receives the following sequence of rewards at each time step: [-1, -1, -1, +10]. The final reward of +10 is received upon reaching the goal, which ends the episode. Calculate the total reward (return) for this episode and briefly explain what this value signifies for the agent.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science