Dynamic State in LLM Policy
An autoregressive language model is generating a response. Its policy for choosing the next word is defined by the formula: π(a|s) = Pr(y_t | x, y_<t). Explain how the 'state' (s) changes from one token generation step to the next, and describe why this change is fundamental to the model's ability to produce coherent text.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An autoregressive language model is given the input prompt 'The weather today is' and has so far generated the token ' exceptionally'. The model is now deciding on the very next token to produce. In a reinforcement learning context where the model's policy is defined as the probability of taking an action 'a' in a state 's', which of the following correctly identifies the state and action for this specific decision-making step?
Dynamic State in LLM Policy
In the context of applying reinforcement learning to a language model, the model's strategy is defined by the policy formula: Match each component of this formulation to its correct description.