Comparison

Comparison of Rejection Sampling and RLHF

When compared to Reinforcement Learning from Human Feedback (RLHF), rejection sampling provides a significantly simpler method for integrating human preferences into the training of Large Language Models. It bypasses the more complex reinforcement learning loop in favor of a straightforward fine-tuning approach on reward-model-selected data.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models