1Cademy - A language model is generating a response to the prompt The best way to learn a new skill is to.... So far, it has produced the sequence The best way to learn a new skill is to practice. At this exact point in the generation process, what constitutes the models next action within a reinforcement learning framework?

Learn Before

Action in the Context of LLMs

Multiple Choice

A language model is generating a response to the prompt 'The best way to learn a new skill is to...'. So far, it has produced the sequence 'The best way to learn a new skill is to practice'. At this exact point in the generation process, what constitutes the model's next 'action' within a reinforcement learning framework?

Updated 2025-10-01

Contributors are:

Who are from:

Learn Before

Related