Comparing Reward Optimization Strategies
A team is training a language model with two distinct and sometimes conflicting reward models: one for maximizing helpfulness and another for ensuring factual accuracy. The team is considering two strategies: 1) combining the two reward models into a single, weighted score, or 2) treating each reward model as a separate objective in a multi-objective optimization framework. Analyze the potential trade-offs, benefits, and challenges of choosing the second strategy (multi-objective optimization) over the first (single combined score).
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A development team is training a language model using two separate reward models: one that rewards helpfulness (RM-H) and another that rewards safety (RM-S). These two objectives are often in conflict. Instead of creating a single, combined reward score, the team decides to train the policy to optimize for both objectives simultaneously as distinct goals. Which of the following outcomes is the most direct and characteristic result of this specific training approach?
Optimizing a Chatbot for Competing Goals
Comparing Reward Optimization Strategies