Argmax Formula for Best Candidate Selection in BoN Sampling
The formal method for selecting the best candidate, , from a set of options in Best-of-N (BoN) sampling is to use the argmax operator. This operator identifies and returns the specific candidate, , that produces the highest score from the reward model, . The formula is expressed as: Unlike the max function, which returns the maximum score itself, argmax returns the input argument (the candidate sequence) that leads to that score.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Argmax Formula for Best Candidate Selection in BoN Sampling
A system generates four candidate outputs in response to a user's prompt. A separate evaluation model then assigns a quality score to each candidate, where a higher score indicates a better response. The system's selection rule is to choose the candidate that receives the maximum score. Given the scores below, which candidate will be selected as the final output?
- Candidate A: Score = 0.85
- Candidate B: Score = 0.91
- Candidate C: Score = 0.74
- Candidate D: Score = 0.23
Consider the process of selecting the best output from a set of N candidates, where a reward model
rscores each candidateŷ_ibased on an inputx. The selection is represented by the formula:ŷ_best = max{r(x, ŷ_1), ..., r(x, ŷ_N)}. This formula implies that the final output,ŷ_best, is a numerical value representing the highest score.Diagnosing a Mismatch in Automated Selection
Learn After
A system generates four candidate text sequences to complete a prompt. A scoring function,
r(sequence), evaluates the quality of each candidate. The system uses theargmaxoperator to select the best one based on these scores:best_sequence = argmax(r(sequence_i)). Given the following candidates and their scores, what is the output of theargmaxoperation?- Candidate A: "The cat sat on the mat." (Score: 0.82)
- Candidate B: "A feline rested on the rug." (Score: 0.91)
- Candidate C: "The mat was under the cat." (Score: 0.75)
- Candidate D: "On the mat, a cat sat." (Score: 0.89)
Debugging a Candidate Selection Script
Distinguishing
maxandargmaxin Candidate Selection