1Cademy - Critiquing an Automated Prompt Evaluation Setup

Learn Before

Evaluating Prompts with Pre-defined Metrics

Case Study

Critiquing an Automated Prompt Evaluation Setup

Based on the scenario below, provide a critique of the company's evaluation method. Explain why a high score on their chosen metric might not be a reliable indicator of a high-quality output in this context.

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

A team is developing a prompt for a Large Language Model to summarize medical research papers. Their primary goal is to ensure the generated summaries are factually accurate and do not misrepresent the findings of the original paper. To automate the evaluation of different prompts, they need to choose a single, pre-defined metric. Which of the following metrics would be the most appropriate for the team to use to assess their primary goal?
Critiquing an Automated Prompt Evaluation Setup
You are evaluating a single candidate prompt to see how well it instructs a language model on a specific task. Arrange the following steps into the correct sequence for assessing the prompt's performance on one given input using a pre-defined metric.

Learn Before

Related