Activity (Process)

Techniques for Generating Diverse Outputs in RLHF

In the data collection phase of RLHF, an instruction-tuned LLM generates multiple, varied responses to a single prompt. A common method to achieve this is by sampling from the model's output space. To further enhance diversity in both the generated outputs and their annotations, a range of techniques can be employed, such as using different LLMs, varying the prompts, or providing different in-context demonstrations.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.5 Inference - Foundations of Large Language Models