Learn Before
Critiquing an Automated Prompt Evaluation Setup
Based on the scenario below, provide a critique of the company's evaluation method. Explain why a high score on their chosen metric might not be a reliable indicator of a high-quality output in this context.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A team is developing a prompt for a Large Language Model to summarize medical research papers. Their primary goal is to ensure the generated summaries are factually accurate and do not misrepresent the findings of the original paper. To automate the evaluation of different prompts, they need to choose a single, pre-defined metric. Which of the following metrics would be the most appropriate for the team to use to assess their primary goal?
Critiquing an Automated Prompt Evaluation Setup
You are evaluating a single candidate prompt to see how well it instructs a language model on a specific task. Arrange the following steps into the correct sequence for assessing the prompt's performance on one given input using a pre-defined metric.