Learn Before
A team of engineers is evaluating a new language model's reasoning capabilities. They use an assessment method where the model must choose the single correct answer from a set of provided options for each question. Which of the following represents a primary limitation of this evaluation method for gauging the model's genuine comprehension?
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
MMLU Benchmark
A team of engineers is evaluating a new language model's reasoning capabilities. They use an assessment method where the model must choose the single correct answer from a set of provided options for each question. Which of the following represents a primary limitation of this evaluation method for gauging the model's genuine comprehension?
AI Tutor Design Strategy
Designing a Challenging Multiple-Choice Question for a Language Model
Example of a Sentence-First Prompt for Grammaticality Judgment with Answer Options