Learn Before
A language model is generating a response to the prompt 'The best way to learn a new skill is to...'. So far, it has produced the sequence 'The best way to learn a new skill is to practice'. At this exact point in the generation process, what constitutes the model's next 'action' within a reinforcement learning framework?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Policy Formula for LLMs in Reinforcement Learning
A language model is generating a response to the prompt 'The best way to learn a new skill is to...'. So far, it has produced the sequence 'The best way to learn a new skill is to practice'. At this exact point in the generation process, what constitutes the model's next 'action' within a reinforcement learning framework?
Comparing 'Action' in Different Reinforcement Learning Scenarios
Identifying the Action in LLM Fine-Tuning