Strategy for Cost-Effective Fine-Tuning Data Curation
Imagine you are leading a project to fine-tune a pre-trained language model to be a versatile creative writing assistant. Your budget for acquiring instruction-following data is severely limited. Propose a practical, cost-effective strategy for creating a high-quality, diverse dataset. In your proposal, explain how your strategy mitigates the risks associated with data scarcity and the high cost of labeling, while still ensuring the model can generalize to a wide range of complex creative prompts.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Creation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating Data Collection Strategies for Fine-Tuning
A development team is fine-tuning a large language model to be a general-purpose assistant but has a very limited budget for data acquisition. They can either purchase a large, inexpensive dataset of simple, templated instructions (e.g., 'Translate this sentence: [sentence]') or a much smaller, very expensive dataset of diverse, complex instructions (e.g., 'Act as a travel agent and plan a 3-day itinerary for a family of four in Paris, focusing on historical sites but including at least one child-friendly activity per day.'). Given these constraints, what is the most significant risk the team faces if they choose to rely exclusively on the large, inexpensive dataset for fine-tuning?
Strategy for Cost-Effective Fine-Tuning Data Curation