1Cademy - Argmax Formula for Best Candidate Selection in BoN Sampling

Learn Before

Best Candidate Selection via Maximum Reward Score in BoN Sampling

Formula

Argmax Formula for Best Candidate Selection in BoN Sampling

The formal method for selecting the best candidate, $\hat{\mathbf{y}}_{\text{best}}$ , from a set of $N$ options in Best-of-N (BoN) sampling is to use the argmax operator. This operator identifies and returns the specific candidate, $\hat{\mathbf{y}}_i$ , that produces the highest score from the reward model, $r$ . The formula is expressed as: $\hat{\mathbf{y}}_{\text{best}} = \underset{\hat{\mathbf{y}}_i \in \{\hat{\mathbf{y}}_1, ..., \hat{\mathbf{y}}_N\}}{\text{argmax}} r(\mathbf{x}, \hat{\mathbf{y}}_i)$ Unlike the max function, which returns the maximum score itself, argmax returns the input argument (the candidate sequence) that leads to that score.

0

1

Updated 2026-06-27

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related

Learn After