Learn Before
Short Answer

Explaining Score Discrepancies in Trained Models

An AI development team trains two separate models, Model X and Model Y, on the exact same dataset of human preferences. The training objective for both models is to assign a higher score to the preferred response in each pair. After training, both models achieve perfect accuracy on the training set. However, when the team inspects the models, they find that for a specific response, Model X assigns a score of 1.5, while Model Y assigns a score of 25.0. Explain, in the context of the training objective, how it is possible for both models to be considered perfectly trained despite producing such different absolute scores for the same input.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science