Multiple Choice

A reward model is being trained using a loss function calculated as the negative log of a sigmoid function applied to the difference in scores between a preferred response (yay_a) and a rejected response (yby_b). For a single training instance, the model outputs a score of r(ya)=2.0r(y_a) = 2.0 for the preferred response and r(yb)=3.0r(y_b) = 3.0 for the rejected response. How will this specific outcome influence the model's parameter update for this step?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science