Learn Before
Reward Model Input Preparation
A data scientist is preparing a single data instance to be evaluated by a reward model. The instance consists of a prompt, x = 'Explain the water cycle in simple terms.', and a generated response, y = 'Water evaporates from oceans, forms clouds, and then falls back as rain.'. Describe the exact structure of the input sequence that should be fed into the reward model for this instance to ensure a meaningful evaluation.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Sequence Representation for Reward Calculation in RLHF
A team is developing a model to automatically assign a quality score to an AI-generated response. To do this, the model must be given some text as input. Which of the following best explains why the model should be given the original prompt concatenated with the AI's response, instead of just the AI's response alone?
Reward Model Input Preparation
Debugging a Reward Model's Input Formulation