An AI development team is using an ensemble of three separate models to evaluate a single generated response. The first model gives the response a score of 8.0, the second model gives it a score of 9.0, and the third model gives it a score of 7.0. To create a more robust and stable final evaluation, the team decides to use a simple averaging method. What is the final combined score for the response?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Combined Reward Formula
An AI development team is using an ensemble of three separate models to evaluate a single generated response. The first model gives the response a score of 8.0, the second model gives it a score of 9.0, and the third model gives it a score of 7.0. To create a more robust and stable final evaluation, the team decides to use a simple averaging method. What is the final combined score for the response?
An AI development team is using three specialized reward models to evaluate generated text: one for general helpfulness, one for factual accuracy, and one for safety. They combine the outputs of these models by taking a simple, unweighted average to produce a single final score. What is the most significant potential drawback of this specific approach?
Evaluating a Chatbot's Response Score