1Cademy - An autoregressive language model is given the input prompt The weather today is and has so far generated the token exceptionally. The model is now deciding on the very next token to produce. In a reinforcement learning context where the models policy is defined as the probability of taking an action a in a state s, which of the following correctly identifies the state and action for this specific decision-making step?

Learn Before

Policy Formula for LLMs in Reinforcement Learning

Multiple Choice

An autoregressive language model is given the input prompt 'The weather today is' and has so far generated the token ' exceptionally'. The model is now deciding on the very next token to produce. In a reinforcement learning context where the model's policy is defined as the probability of taking an action 'a' in a state 's', which of the following correctly identifies the state and action for this specific decision-making step?

Updated 2025-09-29

Contributors are:

Who are from:

Learn Before

Related