Short Answer

Analyzing Reward Model Performance via Loss Function

A machine learning engineer is training a reward model using a pair-wise ranking loss function. They observe that the loss value is consistently high and not decreasing. This is happening because the model is assigning very similar reward scores to both the preferred and non-preferred responses for any given input. Explain, by referencing the components of the loss function Loss = -E[log(Sigmoid(R_preferred - R_dispreferred))], why this scenario leads to a high loss value.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science