An agent in an environment completes a sequence of two actions. It starts in an initial state s₀, performs action a₀ to reach state s₁, and then performs action a₁ to reach the final state s₂. Which of the following notations correctly represents the full sequence of state-action pairs, often called a trajectory (τ)?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Cumulative Reward of a Trajectory
An agent in an environment completes a sequence of two actions. It starts in an initial state
s₀, performs actiona₀to reach states₁, and then performs actiona₁to reach the final states₂. Which of the following notations correctly represents the full sequence of state-action pairs, often called a trajectory (τ)?Critiquing Trajectory Notations
An agent interacts with an environment for a total of
Ttime steps, resulting in a sequence of states and actions. Match each mathematical notation for this sequence (trajectory, τ) to the description that accurately characterizes its structure and length.