Optimizing Fine-Tuning Data Strategy
A machine learning team has a fixed budget to prepare a dataset for fine-tuning a language model. This budget allows for either creating 5,000 high-quality, human-verified examples or automatically generating 100,000 lower-quality, noisy examples. Explain which strategy is likely to produce a better-performing model and justify your reasoning.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating Data Strategies for Model Fine-Tuning
A development team is fine-tuning a language model for a specialized medical question-answering task where accuracy is critical. They have two potential datasets: Dataset A consists of 100,000 unfiltered Q&A pairs scraped from various online health forums. Dataset B consists of 5,000 Q&A pairs carefully curated and verified for accuracy by medical experts. Which statement best evaluates the most effective approach for the team?
Optimizing Fine-Tuning Data Strategy