Multiple Choice

A reward model is being trained on a dataset of human preferences, where each data point consists of a prompt, a preferred response, and a rejected response. The training process aims to minimize a ranking loss function. For a single data point, which of the following outcomes would generate the largest loss value, thereby prompting the most significant update to the model's parameters?

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science