Learn Before
Best Candidate Selection via Maximum Reward Score in BoN Sampling
In Best-of-N (BoN) sampling, once a set of candidate outputs has been generated, a reward model evaluates each one. The final output, , is the candidate that receives the highest score. This selection process is represented by the formula: Although the max function technically returns the highest score value, this notation is commonly used as a shorthand to represent the selection of the argument (the candidate ) that yields this maximum score.

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Best Candidate Selection via Maximum Reward Score in BoN Sampling
An AI system generates four possible summaries for a user's request. A scoring mechanism then evaluates each summary for quality, assigning a numerical score where higher is better. Based on the scores below, which summary would be selected as the final output?
- Summary A: Score 0.85
- Summary B: Score -0.20
- Summary C: Score 1.50
- Summary D: Score 1.15
An AI system is designed to generate helpful and safe responses. For a given prompt, it first creates three distinct candidate responses. A secondary component then scores each candidate for helpfulness and safety, and the response with the highest score is selected as the final output. If the system ultimately produces a response that is factually incorrect and unhelpful, which of the following is the most likely point of failure in the process?
Consider a system that first generates a diverse set of potential answers to a prompt and then uses a separate scoring component to select the single best answer to show the user. In this system, the quality of the final, user-facing answer is determined exclusively by the quality of the initial set of potential answers.
Learn After
Argmax Formula for Best Candidate Selection in BoN Sampling
A system generates four candidate outputs in response to a user's prompt. A separate evaluation model then assigns a quality score to each candidate, where a higher score indicates a better response. The system's selection rule is to choose the candidate that receives the maximum score. Given the scores below, which candidate will be selected as the final output?
- Candidate A: Score = 0.85
- Candidate B: Score = 0.91
- Candidate C: Score = 0.74
- Candidate D: Score = 0.23
Consider the process of selecting the best output from a set of N candidates, where a reward model
rscores each candidateŷ_ibased on an inputx. The selection is represented by the formula:ŷ_best = max{r(x, ŷ_1), ..., r(x, ŷ_N)}. This formula implies that the final output,ŷ_best, is a numerical value representing the highest score.Diagnosing a Mismatch in Automated Selection