Learn Before
Concept

Causality in Reinforcement Learning

In reinforcement learning, policy decisions operate under a causality constraint. This means that an action selected at a specific time step, tt, can only impact rewards obtained at or after that time (rt,rt+1,...r_t, r_{t+1}, ...). Rewards received prior to time tt are considered unchangeable or 'fixed' from the perspective of the action at tt.

0

1

Updated 2026-02-05

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related