1Cademy - Combined Reward Formula

Learn Before

Averaging Outputs as a Method for Combining Reward Models

Formula

Combined Reward Formula

The combined reward, $r_{\mathrm{combine}}$ , is calculated by taking a weighted average of the outputs from $K$ different reward models. Each individual reward model's output, $r_k(\mathbf{x}, \mathbf{y})$ , is multiplied by a weight $w_k$ . These products are summed up over all $K$ models, and the result is normalized by dividing by $K$ . The formula is expressed as: