Evaluating Data Collection Strategies for Fine-Tuning
A machine learning team has a limited budget to create a dataset for fine-tuning a large language model. They must choose between two strategies. Analyze the potential outcomes of each strategy and determine which is more likely to result in a model that can handle a wide variety of user requests effectively. Justify your choice by explaining the primary risk of the rejected strategy.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating Data Collection Strategies for Fine-Tuning
A development team is fine-tuning a large language model to be a general-purpose assistant but has a very limited budget for data acquisition. They can either purchase a large, inexpensive dataset of simple, templated instructions (e.g., 'Translate this sentence: [sentence]') or a much smaller, very expensive dataset of diverse, complex instructions (e.g., 'Act as a travel agent and plan a 3-day itinerary for a family of four in Paris, focusing on historical sites but including at least one child-friendly activity per day.'). Given these constraints, what is the most significant risk the team faces if they choose to rely exclusively on the large, inexpensive dataset for fine-tuning?
Strategy for Cost-Effective Fine-Tuning Data Curation