Formula

Best Candidate Selection via Maximum Reward Score in BoN Sampling

In Best-of-N (BoN) sampling, once a set of NN candidate outputs has been generated, a reward model rr evaluates each one. The final output, y^best\hat{\mathbf{y}}_{\text{best}}, is the candidate that receives the highest score. This selection process is represented by the formula: y^best=max{r(x,y^1),...,r(x,y^N)}\hat{\mathbf{y}}_{\text{best}} = \max\{r(\mathbf{x}, \hat{\mathbf{y}}_1), ..., r(\mathbf{x}, \hat{\mathbf{y}}_N)\} Although the max function technically returns the highest score value, this notation is commonly used as a shorthand to represent the selection of the argument (the candidate y^i\hat{\mathbf{y}}_i) that yields this maximum score.

Image 0

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences