1Cademy - Generation of Candidate Outputs from Input-Only Datasets in RLHF

Learn Before

Techniques for Generating Diverse Outputs in RLHF

Concept

Generation of Candidate Outputs from Input-Only Datasets in RLHF

In Reinforcement Learning from Human Feedback (RLHF), the training process starts with a dataset that typically contains only input prompts, lacking pre-annotated outputs. To create training examples, the language model itself is used to generate a set of $N$ distinct candidate outputs, denoted as ${\mathbf{y}_1, ..., \mathbf{y}_N}$ , for a given prompt. Each of these generated responses, $\mathbf{y}_i$ , is then evaluated to provide the feedback signal used for fine-tuning the model.