Formula

Argmax Formula for Best Candidate Selection in BoN Sampling

The formal method for selecting the best candidate, y^best\hat{\mathbf{y}}_{\text{best}}, from a set of NN options in Best-of-N (BoN) sampling is to use the argmax operator. This operator identifies and returns the specific candidate, y^i\hat{\mathbf{y}}_i, that produces the highest score from the reward model, rr. The formula is expressed as: y^best=argmaxy^i{y^1,...,y^N} r(x,y^i)\hat{\mathbf{y}}_{\text{best}} = \underset{\hat{\mathbf{y}}_i \in \{\hat{\mathbf{y}}_1, ..., \hat{\mathbf{y}}_N\}}{\text{argmax}} \ r(\mathbf{x}, \hat{\mathbf{y}}_i) Unlike the max function, which returns the maximum score itself, argmax returns the input argument (the candidate sequence) that leads to that score.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences