Essay

Analysis of Dataset Expansion Strategies

Consider two scenarios for expanding a dataset for a language-based machine learning task:

Scenario A: A team uses a powerful, general-purpose language model. They provide it with a few high-quality examples of an input and its desired output, and then prompt the model to generate thousands of new, similar input-output pairs for training.

Scenario B: Another team starts with a set of sentences. To create more training data, they apply transformations to each sentence, such as replacing words with their synonyms or translating the sentence to another language and then back to the original.

Analyze the relationship between these two approaches. In your response, discuss their fundamental similarities in purpose and principle, as well as their key differences in terms of the novelty and diversity of the data they produce.

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science