Designing a Context-Aware Reward Model
An AI development team is creating a reward model to score the quality of individual paragraphs within a multi-paragraph article generated in response to a user's query. A key requirement is that the score for a paragraph must account for its coherence with the rest of the article and its direct relevance to the original query. To achieve this, what three distinct components must be provided as input to the reward model when it evaluates a single paragraph? Briefly explain the role of each component in meeting the design requirements.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Segment-Based Reward Score Formula
A team is developing a system to score individual sentences within a long, multi-paragraph response generated by a model. They observe that the system sometimes gives a high score to a sentence that, while well-written in isolation, directly contradicts information presented in a previous paragraph of the same response. Which of the following is the most likely reason for this evaluation error?
Designing a Context-Aware Reward Model
A reward model is designed to evaluate the quality of a specific sentence within a longer, AI-generated response. For the model to accurately score the sentence, it requires three distinct pieces of information as input. Match each required input component with its primary role in the evaluation process.