1Cademy - An AI team is training a reward model using a dataset where, for each prompt, human annotators have ranked several generated responses from best to worst. What is the fundamental task the reward model is being trained to perform based on this specific type of data?

Learn Before

Reward Model Learning in RLHF

Multiple Choice

An AI team is training a reward model using a dataset where, for each prompt, human annotators have ranked several generated responses from best to worst. What is the fundamental task the reward model is being trained to perform based on this specific type of data?

Updated 2025-10-05

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences