Learn Before
A machine learning engineer is fine-tuning a large language model using a reinforcement learning approach. They mistakenly define the loss function to be minimized as , where is a utility function that returns high values for desirable outputs and low values for undesirable ones. What is the most likely outcome of this training process?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model's policy, , is being updated by minimizing the loss function , where is a given input, is an output generated by the model, and is a utility function that assigns a high score to desirable outputs and a low score to undesirable ones. What is the direct consequence of minimizing this loss function on the model's behavior?
Deconstructing the Reinforcement Learning Loss Function
A machine learning engineer is fine-tuning a large language model using a reinforcement learning approach. They mistakenly define the loss function to be minimized as , where is a utility function that returns high values for desirable outputs and low values for undesirable ones. What is the most likely outcome of this training process?
Prevalence of Advanced RL Algorithms in RLHF