Learn Before
A human evaluator is presented with the following prompt and two responses. The evaluator chooses Response A as the better one. This interaction is used to create a single data point for training a reward model, structured as a tuple containing an input prompt (x), a preferred response (y_k1), and a rejected response (y_k2). Match each item below to its correct role in this data sample.
Prompt: 'Summarize the plot of Hamlet in three sentences.' Response A: 'Hamlet is a play about a prince who seeks revenge for his father's murder. He feigns madness, confronts his mother, and duels his uncle's co-conspirator, leading to a tragic end for the royal family.' Response B: 'Hamlet is a famous play.'
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Pair-wise Ranking Loss Formula for RLHF Reward Model
A team is creating a dataset to train a reward model. The model's objective is to learn to assign higher scores to helpful and detailed responses over unhelpful or overly brief ones. For the input prompt
x = 'Explain the water cycle.', which of the following data samples, represented as a tuple(prompt, chosen_response, rejected_response), would be the most effective and correctly structured training point for this objective?Constructing a Preference Data Sample from Human Feedback
A human evaluator is presented with the following prompt and two responses. The evaluator chooses Response A as the better one. This interaction is used to create a single data point for training a reward model, structured as a tuple containing an input prompt (x), a preferred response (y_k1), and a rejected response (y_k2). Match each item below to its correct role in this data sample.
Prompt: 'Summarize the plot of Hamlet in three sentences.' Response A: 'Hamlet is a play about a prince who seeks revenge for his father's murder. He feigns madness, confronts his mother, and duels his uncle's co-conspirator, leading to a tragic end for the royal family.' Response B: 'Hamlet is a famous play.'
Preference Dataset Sampling Operation