Essay

Evaluating Data Generation Strategies for Model Generalization

A company is developing a customer support chatbot. They plan to fine-tune a large language model using synthetically generated data. They are considering two strategies for creating the input prompts that will be fed to a generator model:

Strategy 1: Use their entire historical log of 500,000 customer support tickets as the input prompts. Strategy 2: Use a set of 20,000 highly varied and unconventional prompts designed by a team of creative writers to mimic unexpected user behavior.

Evaluate these two strategies. In your response, argue which strategy is more likely to result in a chatbot that generalizes well to a wide range of future, real-world user queries. Justify your conclusion by explaining the potential risks associated with the less effective strategy.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science