Learn Before
Component Analysis of the Combined Reward Formula
In the formula for a combined reward score, , explain the distinct purpose of the weight term () compared to the normalization factor (). Why is it important to have both in the calculation?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Using Combined Reward for Policy Supervision
An AI alignment team is evaluating a language model's response using three distinct reward models: Helpfulness, Harmlessness, and Conciseness. For a specific response, the models provide the following scores and are assigned the following weights:
- Helpfulness: Score = 8.0, Weight = 2.0
- Harmlessness: Score = 9.0, Weight = 3.0
- Conciseness: Score = 6.0, Weight = 1.0
Using the weighted average formula for combining rewards, what is the final aggregated reward score for this response? (Assume K is the total number of models).
Adjusting Chatbot Behavior via Reward Model Weighting
Component Analysis of the Combined Reward Formula