Learn Before
A language model is tasked with generating a sentence. After producing the partial sequence 'The cat sat on the', it computes the following probability distribution for the next word: {'mat': 0.7, 'chair': 0.2, 'roof': 0.1}. If we frame this generation process using reinforcement learning, how is this probability distribution correctly interpreted?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Policy Gradient Utility for Sequence Generation
A language model is tasked with generating a sentence. After producing the partial sequence 'The cat sat on the', it computes the following probability distribution for the next word: {'mat': 0.7, 'chair': 0.2, 'roof': 0.1}. If we frame this generation process using reinforcement learning, how is this probability distribution correctly interpreted?
Equivalence of Language Model and Policy
Conceptual Error in RL Fine-Tuning