Learn Before
Conceptual Error in RL Fine-Tuning
Based on the standard formulation for applying reinforcement learning to sequence generation, identify the primary conceptual misunderstanding in the engineer's proposed architecture and explain why it is incorrect.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Policy Gradient Utility for Sequence Generation
A language model is tasked with generating a sentence. After producing the partial sequence 'The cat sat on the', it computes the following probability distribution for the next word: {'mat': 0.7, 'chair': 0.2, 'roof': 0.1}. If we frame this generation process using reinforcement learning, how is this probability distribution correctly interpreted?
Equivalence of Language Model and Policy
Conceptual Error in RL Fine-Tuning