Multiple Choice

A development team is training a language model using two separate reward models: one that rewards helpfulness (RM-H) and another that rewards safety (RM-S). These two objectives are often in conflict. Instead of creating a single, combined reward score, the team decides to train the policy to optimize for both objectives simultaneously as distinct goals. Which of the following outcomes is the most direct and characteristic result of this specific training approach?

0

1

Updated 2025-10-01

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science