1Cademy - Contrasting Data Sourcing Methods in Model Training

Learn Before

Dataset Composition for RL Fine-Tuning in RLHF

Short Answer

Contrasting Data Sourcing Methods in Model Training

A language model is being refined through a process where, for each training instance, an input prompt is selected from a collection. The model then generates a corresponding output based on its current state. This input-output pair is then immediately used for that training step. Contrast this method of obtaining the output portion of a training sample with an approach that uses a fixed, pre-written set of ideal outputs for each prompt. What is a primary advantage of the model-generating its own outputs for training?

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Learn Before

Related