Multiple Choice

A team is training a reward model using a loss function that only considers the relative ranking between two responses (i.e., that a preferred response gets a higher score than a dispreferred one). They observe that while the model learns the correct rankings, the absolute reward scores it assigns can grow uncontrollably large (e.g., scores of +1,000,000 and +999,999 are treated the same as +2 and +1). To fix this, they add a regularization term that penalizes the squared sum of the two rewards in each training pair. Which statement best analyzes how this specific regularization addresses the problem?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science