1Cademy - During one step of the final fine-tuning stage for a large language model, the model is given an input prompt `x`. It generates two different responses, `y_1` and `y_2`. A separate, pre-trained reward system evaluates both responses and assigns a higher score to `y_1` than to `y_2`. Based on this single event, what is the immediate goal of the optimization update applied to the language models parameters?

Learn Before

RLHF Objective Function

Multiple Choice

During one step of the final fine-tuning stage for a large language model, the model is given an input prompt x. It generates two different responses, y_1 and y_2. A separate, pre-trained reward system evaluates both responses and assigns a higher score to y_1 than to y_2. Based on this single event, what is the immediate goal of the optimization update applied to the language model's parameters?

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related