Multiple Choice

A development team is fine-tuning a large language model to serve as a general-purpose assistant capable of handling a wide variety of user queries. They have two potential datasets for this process:

  • Dataset A: A large dataset with 2 million examples, all focused on a single, complex task: summarizing scientific research papers.
  • Dataset B: A smaller dataset with 200,000 examples, but spread across 150 different tasks, such as question-answering, creative writing, translation, and code generation.

Based on principles of effective model fine-tuning, which dataset is more likely to produce a better general-purpose assistant, and why?

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science