1Cademy - Examples of LLM-Generated Responses for RLHF Evaluation

Learn Before

Techniques for Generating Diverse Outputs in RLHF

Example

Examples of LLM-Generated Responses for RLHF Evaluation

In the data collection phase of Reinforcement Learning from Human Feedback (RLHF), an LLM generates multiple distinct outputs for a single prompt by sampling from its output space. For instance, given the prompt 'How can I live a more environmentally friendly life?', the model might produce the following set of four responses, mathematically denoted as $\mathbf{y}_1, \dots, \mathbf{y}_4$ , for human evaluation:

Output 1 ( $\mathbf{y}_1$ ): Consider switching to an electric vehicle or bicycle instead of traditional cars to reduce carbon emissions and protect our planet.
Output 2 ( $\mathbf{y}_2$ ): Adopt a minimalist lifestyle. Own fewer possessions to reduce consumption and the environmental impact of manufacturing and disposal.
Output 3 ( $\mathbf{y}_3$ ): Go off-grid. Generate your own renewable energy and collect rainwater to become completely self-sufficient and reduce reliance on non-renewable resources.
Output 4 ( $\mathbf{y}_4$ ): Support local farm products to reduce the carbon footprint of transporting food, while enjoying fresh, healthy food.

0

1

Updated 2026-04-20

Contributors are:

Who are from:

References

Learn Before

Related

Learn After