1Cademy - In the context of applying reinforcement learning to a language model, the models strategy is defined by the policy formula: $$\pi(a|s) = \text{Pr}(y_t | \mathbf{x}, \mathbf{y}_{<t})$$ Match each component of this formulation to its correct description.

Learn Before

Policy Formula for LLMs in Reinforcement Learning

Matching

In the context of applying reinforcement learning to a language model, the model's strategy is defined by the policy formula: $\pi(a|s) = \text{Pr}(y_t | \mathbf{x}, \mathbf{y}_{<t})$ Match each component of this formulation to its correct description.