Evaluating a Data Augmentation Strategy
A development team has a limited set of 200 expert-written examples for training a model to summarize legal documents. To create more training data, they use these examples as few-shot prompts for a powerful, general-purpose language model, asking it to generate thousands of new legal document summaries. Explain the primary advantage and the most significant risk of this data generation method compared to solely using the original 200 expert-written examples.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A team is developing a specialized chatbot to answer questions about a company's internal financial policies. They begin with a small, high-quality 'seed' dataset of 150 question-answer pairs written by their finance experts. To expand this dataset, they use the seed examples to prompt a large base model to generate 15,000 new, similar question-answer pairs. This new, larger dataset is then used to fine-tune the chatbot. Which of the following describes the most significant potential weakness of the final chatbot?
AI Assistant Development Strategy
Evaluating a Data Augmentation Strategy