Multiple Choice

An AI alignment team is evaluating a language model's response using three distinct reward models: Helpfulness, Harmlessness, and Conciseness. For a specific response, the models provide the following scores and are assigned the following weights:

  • Helpfulness: Score = 8.0, Weight = 2.0
  • Harmlessness: Score = 9.0, Weight = 3.0
  • Conciseness: Score = 6.0, Weight = 1.0

Using the weighted average formula for combining rewards, what is the final aggregated reward score for this response? (Assume K is the total number of models).

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science