Learn Before
True or False: In a Reinforcement Learning from Human Feedback (RLHF) system, the reference policy is a function that is trained to approximate the probability distribution generated by the reference model.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a reinforcement learning process that uses human feedback, a 'reference model' with a fixed set of parameters, , is used as a baseline. For a specific input prompt, this model calculates that the probability of generating the word 'consequently' as the next word is 0.04. Given that the reference policy, , is formally defined as the probability distribution generated by this reference model, what is the value of ?
True or False: In a Reinforcement Learning from Human Feedback (RLHF) system, the reference policy is a function that is trained to approximate the probability distribution generated by the reference model.
Reference Policy and Model Probability