Learn Before
Equivalence of Language Model and Policy
In the context of applying reinforcement learning to text generation, explain the relationship between the language model's conditional probability distribution and the policy. Why is it possible to treat them as equivalent?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Policy Gradient Utility for Sequence Generation
A language model is tasked with generating a sentence. After producing the partial sequence 'The cat sat on the', it computes the following probability distribution for the next word: {'mat': 0.7, 'chair': 0.2, 'roof': 0.1}. If we frame this generation process using reinforcement learning, how is this probability distribution correctly interpreted?
Equivalence of Language Model and Policy
Conceptual Error in RL Fine-Tuning