Multiple Choice

A model is being trained to evaluate text completions for a given prompt. The training data consists of pairs of completions for each prompt, where one is marked as 'preferred' (yay_a) and the other as 'dispreferred' (yby_b) by human reviewers. The model learns by minimizing the following loss function, averaged over all pairs in the dataset:

L=logσ(score(ya)score(yb))\mathcal{L} = - \log \sigma(score(y_a) - score(y_b))

where σ\sigma is the sigmoid function and score(y)score(y) is the value the model assigns to a completion yy.

What is the primary effect of minimizing this loss function on the scores the model assigns?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science