Concept

Multi-Objective Optimization for Policy Training with Multiple Reward Models

Instead of combining multiple reward models into a single reward signal, an alternative approach is to treat the task as a multi-objective optimization problem. This framework involves training the policy to simultaneously optimize for the objectives defined by each individual reward model.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course