Short Answer

Analysis of Pointwise Rating Loss Behavior

A reward model is being trained using the objective function L = - (s - r)^2, where s is the target score and r is the model's predicted reward. During an evaluation step, you observe two distinct predictions for two different data points:

  • Data Point 1: Target score s_1 = 7, Model prediction r_1 = 6
  • Data Point 2: Target score s_2 = 4, Model prediction r_2 = 5

Compare the loss values (L_1 and L_2) for these two data points. Based on this comparison, explain what this specific objective function incentivizes the model to do regarding the magnitude versus the direction of its prediction errors.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science