1Cademy - A team is training a language model to generate helpful responses. They are considering two different feedback mechanisms to guide the training process: * **Mechanism A:** A classifier that labels each generated response as either Good or Bad. * **Mechanism B:** A scoring model that assigns each generated response a numerical score from 1 to 10, representing its degree of quality. Which statement best analyzes the fundamental advantage of using Mechanism B over Mechanism A for refining

Learn Before

Continuous Supervision from the RLHF Reward Model

Multiple Choice

A team is training a language model to generate helpful responses. They are considering two different feedback mechanisms to guide the training process:

Mechanism A: A classifier that labels each generated response as either 'Good' or 'Bad'.
Mechanism B: A scoring model that assigns each generated response a numerical score from 1 to 10, representing its degree of quality.

Which statement best analyzes the fundamental advantage of using Mechanism B over Mechanism A for refining

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related