1Cademy - Trajectory Generation as a Markov Decision Process

Concept

Trajectory Generation as a Markov Decision Process

The process of generating a sequence, or trajectory (τ), can be formally modeled as a Markov Decision Process (MDP). This framework is essential for applying reinforcement learning to sequential tasks, as it defines the states, actions, and transition probabilities that govern the generation of trajectories under a given policy.

Updated 2026-05-02

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Objective Function as Expected Cumulative Reward (Performance Function)
An agent operates in an environment where sequences of events unfold over time. The agent's behavior is described by a policy, denoted as π(a|s), which gives the probability of taking action 'a' when in state 's'. The environment's dynamics are described by a transition function, P(s'|s, a), which gives the probability of moving to the next state 's'' after taking action 'a' in state 's'. The process begins from an initial state, s₀, with a probability of P(s₀).

Consider the following specific
Diagnosing a Faulty Sequence Generation Process
When modeling the generation of a sequence of states and actions as a Markov Decision Process, the probability of transitioning to a new state at any given step depends on the complete history of all states and actions that have occurred since the beginning of the sequence.
Notational Variations in State-Action Sequences (Trajectories)
An agent is generating a sequence by interacting with an environment. For a single time step, starting from state s_t, arrange the following events in the correct logical order.