1Cademy - A development team has a pre-trained language model and wants to fine-tune it to produce responses that are more helpful and safe. Their strategy involves first creating a separate model whose sole job is to score how good a given response is, based on human preferences. Which of the following best describes the data and objective used to train this specific scoring model?

Learn Before

Training of Reward Models

Multiple Choice

A development team has a pre-trained language model and wants to fine-tune it to produce responses that are more helpful and safe. Their strategy involves first creating a separate model whose sole job is to score how good a given response is, based on human preferences. Which of the following best describes the data and objective used to train this specific 'scoring' model?

Updated 2025-10-07

Contributors are:

Who are from:

Learn Before

Related