Learn Before
Best-of-N Sampling (BoN Sampling)
Rejection Sampling for LLM Fine-Tuning
Rejection sampling is a technique for fine-tuning Large Language Models by incorporating human preferences. The process involves generating a list of N-best outputs, using a reward model to identify the highest-quality responses from this list, and then using this curated set of 'best' outputs as the data for fine-tuning the LLM.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Input and Output Formulation in BoN Sampling
Generating N-Best Candidates in BoN Sampling
Reward Model Selection in BoN Sampling
Rejection Sampling for LLM Fine-Tuning
A company wants to improve the safety and helpfulness of its AI assistant without the high cost and time of retraining the entire base model. They propose a new system for handling user queries: for each query, the system will first generate 10 different potential responses. Then, a separate, fast-acting 'quality-scoring' model will evaluate all 10 responses based on pre-defined criteria. Finally, the system will present only the single response that received the highest score to the user. What is the most significant trade-off of this approach compared to simply using the first response the base model generates?
Learn After
Comparison of Rejection Sampling and RLHF
Adoption of Rejection Sampling in LLMs
Analyzing a Flawed Model Improvement Pipeline