Diagnosing Model Performance Issues
A company fine-tunes a language model to serve as an expert Q&A system for advanced theoretical physics. They hire a team of recent physics graduates to manually write thousands of question-and-answer pairs for the training data. After deployment, users report that while the model answers undergraduate-level questions correctly, its responses to questions at the forefront of research are often inconsistent, superficial, or incorrect. Based on the data generation method used, what is the most likely underlying cause of this performance gap?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Complexity of Data Annotation for LLMs vs. Conventional NLP
Initial Step in Creating Machine Translation Fine-Tuning Data
Limitations of Manual Data Generation for Fine-Tuning
Difficulty of Human Annotation for Complex Tasks
A small, unfunded research lab wants to fine-tune a language model for a highly specialized, novel task: generating legal summaries of court proceedings for a niche area of patent law. They have access to a few legal experts but have a very limited budget. If they choose to have their experts create the input-output training pairs from scratch, which statement best evaluates the primary trade-off they will face?
Diagnosing Model Performance Issues
Evaluating Data Generation Strategy for a General-Purpose LLM