Best-of-N Sampling (BoN Sampling)
Best-of-N (BoN) sampling is a technique where a model generates multiple, or 'N', alternative outputs, and a reward model scores them to select the best one. While commonly used for inference-time alignment through reranking, the core mechanism of BoN sampling can also be adapted for training purposes, such as in rejection sampling.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Related
Using Scoring Systems for Inference-Time Rescoring
Best-of-N Sampling (BoN Sampling)
Use of Reranking to Explore Model and Search Errors
The Challenge of Candidate Diversity in Reranking Methods
A development team uses a large, pre-trained language model to generate summaries of news articles. To improve the factual accuracy of the final output, their system first generates five different summary candidates. Then, a separate, specialized scoring model evaluates each of the five summaries for factual consistency with the original article and selects the one with the highest score. Which statement best analyzes the trade-offs of this approach?
Improving Chatbot Responses on a Budget
A system is designed to improve the quality of its generated responses at inference time without altering the base model's parameters. It does this by producing several options and then choosing the best one. Arrange the following actions into the correct operational sequence.
Learn After
Input and Output Formulation in BoN Sampling
Generating N-Best Candidates in BoN Sampling
Reward Model Selection in BoN Sampling
Rejection Sampling for LLM Fine-Tuning
A company wants to improve the safety and helpfulness of its AI assistant without the high cost and time of retraining the entire base model. They propose a new system for handling user queries: for each query, the system will first generate 10 different potential responses. Then, a separate, fast-acting 'quality-scoring' model will evaluate all 10 responses based on pre-defined criteria. Finally, the system will present only the single response that received the highest score to the user. What is the most significant trade-off of this approach compared to simply using the first response the base model generates?
A system is designed to improve the quality of its generated text by producing multiple options and then picking the best one. Arrange the following steps of this process in the correct logical order.
Chatbot Response Quality Improvement