Learn Before
Components of the RLHF Objective Function
In the context of fine-tuning a language model, the objective is often expressed as minimizing a loss function L(x, {y_1, y_2}, r). Briefly explain the role of each of the three main components (x, {y_1, y_2}, and r) in this optimization process.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is being fine-tuned using a specific process: for any given input prompt, the model generates two responses. A separate, pre-trained 'reward' system then scores both responses, and the language model's parameters are adjusted to make it more likely to produce responses that receive a high score. After extensive fine-tuning with this method, developers notice the model has become very good at generating responses that are stylistically polished, highly confident, and persuasive, but are often factually incorrect. What is the most likely cause of this outcome, based on the mechanics of the described fine-tuning objective?
Components of the RLHF Objective Function
During one step of the final fine-tuning stage for a large language model, the model is given an input prompt
x. It generates two different responses,y_1andy_2. A separate, pre-trained reward system evaluates both responses and assigns a higher score toy_1than toy_2. Based on this single event, what is the immediate goal of the optimization update applied to the language model's parameters?