Definition

Notational Variations in State-Action Sequences (Trajectories)

A state-action sequence, or trajectory (τ), documents the path an agent takes through an environment. While the core concept is consistent, the notation used to represent these sequences can vary. For instance, a trajectory may be denoted as starting from time step 1, such as τ = {(s₁, a₁), (s₂, a₂), ...}, often to align with other notation in a specific context, like sequence prediction. Alternatively, it is common in reinforcement learning literature to see trajectories starting from time step 0, with varying lengths, such as τ = {(s₀, a₀), ..., (sT, aT)} or τ = {(s₀, a₀), ..., (sT-₁, aT-₁)}. These notational differences are a matter of convention and do not alter the fundamental principles or models being discussed.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related