1Cademy - A machine learning engineer is training a reward model to align with human preferences. The objective is to find the set of parameters, denoted by φ, that minimizes the loss function L_r(φ). After testing several parameter sets, the engineer recorded the following results: - φ_1: L_r(φ_1) = 0.69 - φ_2: L_r(φ_2) = 0.35 - φ_3: L_r(φ_3) = 0.51 - φ_4: L_r(φ_4) = 0.42 Given the optimization goal expressed as φ̂ = arg min_φ L_r(φ), which parameter set should the engineer select as the optimal one?

Learn Before

Optimal Reward Model Parameter Estimation

Multiple Choice

A machine learning engineer is training a reward model to align with human preferences. The objective is to find the set of parameters, denoted by φ, that minimizes the loss function L_r(φ). After testing several parameter sets, the engineer recorded the following results:

φ_1: L_r(φ_1) = 0.69
φ_2: L_r(φ_2) = 0.35
φ_3: L_r(φ_3) = 0.51
φ_4: L_r(φ_4) = 0.42

Given the optimization goal expressed as φ̂ = arg min_φ L_r(φ), which parameter set should the engineer select as the optimal one?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related