Learn Before
Evaluating Prompts with Pre-defined Metrics
A method for assessing the quality of a candidate prompt involves using it to generate an output from a Large Language Model for a given input, and then applying a pre-defined metric to evaluate the performance of that output on the specific task.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Evaluating Prompts with Pre-defined Metrics
Using Log-Likelihood to Evaluate Prompts
Pruning the Prompt Candidate Pool
A team is developing a system to classify customer feedback emails as 'Urgent' or 'Not Urgent'. They have created a set of 20 different instruction prompts to guide a language model in this classification task. To determine the best prompt, they select one sample 'Urgent' email and test each of the 20 prompts on it. They decide to choose the prompt that successfully leads the model to classify this single email as 'Urgent'. What is the most significant flaw in this evaluation strategy?
A developer has created a set of candidate prompts to make a language model summarize news articles. To find the best prompt, each one must be evaluated. Arrange the following actions into the correct logical sequence for evaluating a single candidate prompt across a dataset of articles.
Evaluating Prompts for a Customer Support Chatbot
Learn After
A team is developing a prompt for a Large Language Model to summarize medical research papers. Their primary goal is to ensure the generated summaries are factually accurate and do not misrepresent the findings of the original paper. To automate the evaluation of different prompts, they need to choose a single, pre-defined metric. Which of the following metrics would be the most appropriate for the team to use to assess their primary goal?
Critiquing an Automated Prompt Evaluation Setup
You are evaluating a single candidate prompt to see how well it instructs a language model on a specific task. Arrange the following steps into the correct sequence for assessing the prompt's performance on one given input using a pre-defined metric.