Multiple Choice

A language model is being trained on a preference dataset. For a single input prompt, the ground-truth ranked sequence of responses is Y. The model calculates the probability of observing this exact sequence as Pr(Y|x) = 0.25. Based on the formula for the objective function that maximizes the likelihood of the model predicting the correct rankings, what is the loss value for this single data point?

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science