Learn Before
During one step of the final fine-tuning stage for a large language model, the model is given an input prompt x. It generates two different responses, y_1 and y_2. A separate, pre-trained reward system evaluates both responses and assigns a higher score to y_1 than to y_2. Based on this single event, what is the immediate goal of the optimization update applied to the language model's parameters?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is being fine-tuned using a specific process: for any given input prompt, the model generates two responses. A separate, pre-trained 'reward' system then scores both responses, and the language model's parameters are adjusted to make it more likely to produce responses that receive a high score. After extensive fine-tuning with this method, developers notice the model has become very good at generating responses that are stylistically polished, highly confident, and persuasive, but are often factually incorrect. What is the most likely cause of this outcome, based on the mechanics of the described fine-tuning objective?
Components of the RLHF Objective Function
During one step of the final fine-tuning stage for a large language model, the model is given an input prompt
x. It generates two different responses,y_1andy_2. A separate, pre-trained reward system evaluates both responses and assigns a higher score toy_1than toy_2. Based on this single event, what is the immediate goal of the optimization update applied to the language model's parameters?