MMLU Benchmark
The MMLU (Massive Multitask Language Understanding) benchmark is a prominent example of how complex reasoning tasks are structured in a question-answering format. Each problem within this benchmark is presented as a multiple-choice question, requiring a Large Language Model to choose the single correct answer from a list of options.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Related
MMLU Benchmark
A product development team is using a large language model to check if a new product concept aligns with their company's core principles. Their initial prompt, "Analyze if our new 'Smart-Mug' concept is consistent with our principles of 'sustainability,' 'simplicity,' and 'affordability'," yields vague and unhelpful responses. How could this reasoning task be most effectively restructured into a question-answering format to guide the model toward a more structured and deductive output?
Improving a Data Analysis Prompt
Reframing a Research Query
MMLU Benchmark
A team of engineers is evaluating a new language model's reasoning capabilities. They use an assessment method where the model must choose the single correct answer from a set of provided options for each question. Which of the following represents a primary limitation of this evaluation method for gauging the model's genuine comprehension?
AI Tutor Design Strategy
Designing a Challenging Multiple-Choice Question for a Language Model
Example of a Sentence-First Prompt for Grammaticality Judgment with Answer Options