Concept

Generation of Candidate Outputs from Input-Only Datasets in RLHF

In Reinforcement Learning from Human Feedback (RLHF), the training process starts with a dataset that typically contains only input prompts, lacking pre-annotated outputs. To create training examples, the language model itself is used to generate a set of NN distinct candidate outputs, denoted as y1,...,yN{\mathbf{y}_1, ..., \mathbf{y}_N}, for a given prompt. Each of these generated responses, yi\mathbf{y}_i, is then evaluated to provide the feedback signal used for fine-tuning the model.

Image 0

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences