Multiple Choice

An agent is operating under a policy parameterized by θ\theta. This policy can result in one of two possible trajectories. Trajectory A has a total reward of 20 and a 70% probability of occurring. Trajectory B has a total reward of -10 and a 30% probability of occurring. Given that the performance of a policy is measured by the expected cumulative reward over all possible trajectories (J(θ)=τPrθ(τ)R(τ)J(\theta) = \sum_{\tau} \text{Pr}_{\theta}(\tau)R(\tau)), what is the value of the performance function J(θ)J(\theta) for this policy?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science