Critiquing Trajectory Notations
An agent starts in an initial state, takes a single action, and arrives at a final state. Two different notations are proposed to represent this entire sequence of events:
Notation 1: τ = {(s₀, a₀)} Notation 2: τ = {(s₀, a₀), (s₁, a₁)}
For each notation, explain one specific reason why it could be considered an incomplete or potentially misleading representation of the described sequence.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Cumulative Reward of a Trajectory
An agent in an environment completes a sequence of two actions. It starts in an initial state
s₀, performs actiona₀to reach states₁, and then performs actiona₁to reach the final states₂. Which of the following notations correctly represents the full sequence of state-action pairs, often called a trajectory (τ)?Critiquing Trajectory Notations
An agent interacts with an environment for a total of
Ttime steps, resulting in a sequence of states and actions. Match each mathematical notation for this sequence (trajectory, τ) to the description that accurately characterizes its structure and length.