1Cademy - A team is training a neural network to evaluate the quality of different text outputs generated in response to a prompt. The training data consists of many examples, where each example includes a prompt, a pair of generated text outputs (Output A and Output B), and a label indicating which output was preferred by a human evaluator. The networks goal is to learn to assign a single numerical score to any given output. Which of the following best describes the fundamental objective that guides the

Learn Before

General Loss Minimization Objective for Reward Model Training

Multiple Choice

A team is training a neural network to evaluate the quality of different text outputs generated in response to a prompt. The training data consists of many examples, where each example includes a prompt, a pair of generated text outputs (Output A and Output B), and a label indicating which output was preferred by a human evaluator. The network's goal is to learn to assign a single numerical score to any given output. Which of the following best describes the fundamental objective that guides the

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related