Multiple Choice

A machine learning engineer is training a reward model to align with human preferences. The objective is to find the set of parameters, denoted by φ, that minimizes the loss function L_r(φ). After testing several parameter sets, the engineer recorded the following results:

  • φ_1: L_r(φ_1) = 0.69
  • φ_2: L_r(φ_2) = 0.35
  • φ_3: L_r(φ_3) = 0.51
  • φ_4: L_r(φ_4) = 0.42

Given the optimization goal expressed as φ̂ = arg min_φ L_r(φ), which parameter set should the engineer select as the optimal one?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science