Case Study

Evaluating Prompts for a Customer Support Chatbot

A company is optimizing a language model to summarize customer complaint emails for its support agents. The goal is to produce a concise, one-sentence summary that accurately captures the core issue. The team is testing two candidate prompts on a validation set of 100 different emails. After generating a summary for each email using both prompts, they had human reviewers score each summary as either 'Accurate' or 'Inaccurate'. Review the results below and determine which prompt is more effective, justifying your choice based on the fundamental principle of prompt evaluation.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science