Case Study

Evaluating a New Robotic Arm Policy

A robotics team wants to evaluate their new policy, denoted as π_new, for a robotic arm. To do this, they need to estimate the on-policy performance measure, J(π_new). They have access to two datasets of the arm's interactions with its environment. Based on the case study details below, which dataset should they use, and why is the other dataset unsuitable for calculating this specific performance measure?

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science