Multiple Choice

An AI team is training a system to learn from human preferences. They have a dataset where for a given input x, humans consistently prefer response y_preferred over response y_rejected. After training, they test two different scoring models, Model A and Model B, on this pair. The models produce the following scores:

  • Model A: score(x, y_preferred) = 3.2, score(x, y_rejected) = 1.5
  • Model B: score(x, y_preferred) = -0.5, score(x, y_rejected) = -2.0

Based on these scores, which statement accurately evaluates the models' performance on this specific example?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Related