1Cademy - A team is developing a system to classify customer feedback emails as Urgent or Not Urgent. They have created a set of 20 different instruction prompts to guide a language model in this classification task. To determine the best prompt, they select one sample Urgent email and test each of the 20 prompts on it. They decide to choose the prompt that successfully leads the model to classify this single email as Urgent. What is the most significant flaw in this evaluation strategy?

Learn Before

Evaluation of Candidate Prompts in Prompt Search

Multiple Choice

A team is developing a system to classify customer feedback emails as 'Urgent' or 'Not Urgent'. They have created a set of 20 different instruction prompts to guide a language model in this classification task. To determine the best prompt, they select one sample 'Urgent' email and test each of the 20 prompts on it. They decide to choose the prompt that successfully leads the model to classify this single email as 'Urgent'. What is the most significant flaw in this evaluation strategy?

Updated 2025-09-29

Contributors are:

Who are from:

Learn Before

Related