True/False

A reward model is trained using a method where human annotators assign an absolute quality score to each response. The model's high sensitivity to disagreements among annotators is primarily a result of the regression algorithm's inherent difficulty in processing a wide numerical range of scores.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science