Multiple Choice

An autoregressive language model is given the input prompt 'The weather today is' and has so far generated the token ' exceptionally'. The model is now deciding on the very next token to produce. In a reinforcement learning context where the model's policy is defined as the probability of taking an action 'a' in a state 's', which of the following correctly identifies the state and action for this specific decision-making step?

0

1

Updated 2025-09-29

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science