Comparing AI Evaluation Systems
An AI development team is building a chatbot to provide factual answers to historical questions. They are considering two different automated systems to improve its responses:
System 1: An automated process that cross-references the chatbot's answer against a curated database of historical facts to assign a 'correctness' score.
System 2: A model trained on data where human historians have rated pairs of answers for the same question, indicating which one is more comprehensive, well-written, and nuanced.
Analyze these two systems. Explain which system functions like a verifier and which functions like a reward model, and describe the fundamental difference in what each system is designed to evaluate.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A team is developing a language model to be a programming assistant. They want to improve two specific capabilities: 1) ensuring the code it generates compiles and runs correctly to solve a given problem, and 2) making its explanatory text and code comments more helpful, clear, and easy for a novice programmer to understand. To achieve this, they need to implement two distinct automated evaluation systems. Which statement accurately assigns the most appropriate evaluation system to each task?
Comparing AI Evaluation Systems
Choosing the Right Evaluation Component