1Cademy - A language model is being fine-tuned using a specific process: for any given input prompt, the model generates two responses. A separate, pre-trained reward system then scores both responses, and the language models parameters are adjusted to make it more likely to produce responses that receive a high score. After extensive fine-tuning with this method, developers notice the model has become very good at generating responses that are stylistically polished, highly confident, and persuasive, but are often factually incorrect. What is the most likely cause of this outcome, based on the mechanics of the described fine-tuning objective?

Learn Before

RLHF Objective Function

Multiple Choice

A language model is being fine-tuned using a specific process: for any given input prompt, the model generates two responses. A separate, pre-trained 'reward' system then scores both responses, and the language model's parameters are adjusted to make it more likely to produce responses that receive a high score. After extensive fine-tuning with this method, developers notice the model has become very good at generating responses that are stylistically polished, highly confident, and persuasive, but are often factually incorrect. What is the most likely cause of this outcome, based on the mechanics of the described fine-tuning objective?

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Learn Before

Related