Learn Before
Evaluating Fine-Tuning Datasets for a General-Purpose AI
A startup is developing a general-purpose AI assistant. They have two potential fine-tuning datasets to train their base language model, both of the same size.
- Dataset X: Contains 500,000 high-quality examples of a single task: summarizing news articles. Each example is framed as an instruction (e.g., 'Summarize the following text: ...').
- Dataset Y: Contains 500,000 examples spread across 20 different tasks, including summarization, translation, question answering, and creative writing. Each example is also framed as an instruction.
Evaluate which dataset is more suitable for creating the general-purpose AI assistant. Justify your choice by explaining how the composition of the fine-tuning data influences the model's ability to handle a wide range of user requests.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A research team fine-tunes two identical large language models. Model A is fine-tuned exclusively on 100,000 examples of text summarization, each presented as an instruction. Model B is fine-tuned on a dataset of the same total size (100,000 examples), but this dataset is a mix of summarization, translation, and question-answering tasks, all framed as instructions. When tested on a completely new task—sentiment analysis—Model B performs significantly better than Model A, which fails almost completely. What is the most likely reason for Model B's superior ability to generalize to the new task?
AI Assistant Fine-Tuning Strategy
Evaluating Fine-Tuning Datasets for a General-Purpose AI