1Cademy - A team is refining a language model using a method where, for each training step, a prompt is selected and the model itself generates a response. This prompt-response pair is then used as part of the input for that training steps update calculation. Based on this description, what is the most accurate analysis of the function of the model-generated response in this specific training phase?

Learn Before

Dataset Composition for RL Fine-Tuning in RLHF

Multiple Choice

A team is refining a language model using a method where, for each training step, a prompt is selected and the model itself generates a response. This prompt-response pair is then used as part of the input for that training step's update calculation. Based on this description, what is the most accurate analysis of the function of the model-generated response in this specific training phase?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related