Multiple Choice

A machine learning engineer is training a reward model where the goal is to align the model's predicted scores, r(x,y)r(\mathbf{x}, \mathbf{y}), with human-provided scores, φ(x,y)\varphi(\mathbf{x}, \mathbf{y}). The standard approach is to maximize the objective function L=E[φ(x,y)r(x,y)]2\mathcal{L} = -\mathbb{E}[\varphi(\mathbf{x}, \mathbf{y}) - r(\mathbf{x}, \mathbf{y})]^2. Suppose the engineer makes a mistake and instead configures the training process to maximize the standard mean squared error, effectively removing the negative sign from the objective: Lmistake=E[φ(x,y)r(x,y)]2\mathcal{L}_{mistake} = \mathbb{E}[\varphi(\mathbf{x}, \mathbf{y}) - r(\mathbf{x}, \mathbf{y})]^2. What would be the most likely effect on the model's behavior during training?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science