Learn Before
Critique of Metric Selection for a Creative LLM
A team is developing a large language model designed to generate creative and original poetry. To evaluate its performance, they are primarily using an 'exact match' accuracy metric, which calculates the percentage of generated poems that are identical, word-for-word, to a pre-written set of reference poems. Critically evaluate the suitability of this metric for this specific application. Justify your reasoning by explaining the potential limitations of this approach and what characteristics a more appropriate metric might have.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A development team is creating a language model for a question-answering system. The system's primary function is to provide precise, factually correct, short-phrase answers to user queries (e.g., answering 'What is the main component of glass?' with 'Silicon dioxide'). The team's most critical objective is to measure how often the model produces the factually correct text. Which evaluation metric and justification best aligns with this objective?
Critique of Metric Selection for a Creative LLM
Model Performance Analysis for Customer Support