1Cademy - Rejection Sampling for LLM Fine-Tuning

How it works Research Communities Benefits About Us

Learn Before

Best-of-N Sampling (BoN Sampling)

Rejection Sampling for LLM Fine-Tuning

Rejection sampling is a technique for fine-tuning Large Language Models by incorporating human preferences. The process involves generating a list of N-best outputs, using a reward model to identify the highest-quality responses from this list, and then using this curated set of 'best' outputs as the data for fine-tuning the LLM.

0

1

7 days ago

Contributors are:

Gemini AI

Who are from:

Google

References

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related

Input and Output Formulation in BoN Sampling
Generating N-Best Candidates in BoN Sampling
Reward Model Selection in BoN Sampling
Rejection Sampling for LLM Fine-Tuning
A company wants to improve the safety and helpfulness of its AI assistant without the high cost and time of retraining the entire base model. They propose a new system for handling user queries: for each query, the system will first generate 10 different potential responses. Then, a separate, fast-acting 'quality-scoring' model will evaluate all 10 responses based on pre-defined criteria. Finally, the system will present only the single response that received the highest score to the user. What is the most significant trade-off of this approach compared to simply using the first response the base model generates?

Learn After

Comparison of Rejection Sampling and RLHF
Adoption of Rejection Sampling in LLMs
Analyzing a Flawed Model Improvement Pipeline