True/False

Consider an off-policy evaluation scenario where the performance of a 'target' policy is estimated using data collected from a 'reference' policy. If the target policy is identical to the reference policy, the importance sampling weight used to adjust the reward of every possible trajectory will be exactly 1.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science