1Cademy - Explaining Score Discrepancies in Trained Models

Learn Before

Underdetermined Model

Short Answer

Explaining Score Discrepancies in Trained Models

An AI development team trains two separate models, Model X and Model Y, on the exact same dataset of human preferences. The training objective for both models is to assign a higher score to the preferred response in each pair. After training, both models achieve perfect accuracy on the training set. However, when the team inspects the models, they find that for a specific response, Model X assigns a score of 1.5, while Model Y assigns a score of 25.0. Explain, in the context of the training objective, how it is possible for both models to be considered perfectly trained despite producing such different absolute scores for the same input.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Learn Before

Related