Learn Before
A machine learning team is tasked with fine-tuning a general-purpose language model to specialize in summarizing complex scientific research papers. The team has access to a massive dataset of papers but has a very limited budget for computation, allowing them to use only a small fraction of the available data for training. Their primary objective is to achieve the highest possible summarization quality given these constraints. Which data selection strategy should the team prioritize to most effectively achieve their goal?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A machine learning team is tasked with fine-tuning a general-purpose language model to specialize in summarizing complex scientific research papers. The team has access to a massive dataset of papers but has a very limited budget for computation, allowing them to use only a small fraction of the available data for training. Their primary objective is to achieve the highest possible summarization quality given these constraints. Which data selection strategy should the team prioritize to most effectively achieve their goal?
Fine-Tuning Efficiency and Performance
Trade-offs in Data Curation for Model Training