Matching

In the context of applying reinforcement learning to a language model, the model's strategy is defined by the policy formula: π(as)=Pr(ytx,y<t)\pi(a|s) = \text{Pr}(y_t | \mathbf{x}, \mathbf{y}_{<t}) Match each component of this formulation to its correct description.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science