1Cademy - Evaluating Data Generation Strategies for Model Generalization

Learn Before

Limitation of Relying on Human-Crafted Inputs for Synthetic Data Generation

Essay

Evaluating Data Generation Strategies for Model Generalization

A company is developing a customer support chatbot. They plan to fine-tune a large language model using synthetically generated data. They are considering two strategies for creating the input prompts that will be fed to a generator model:

Strategy 1: Use their entire historical log of 500,000 customer support tickets as the input prompts. Strategy 2: Use a set of 20,000 highly varied and unconventional prompts designed by a team of creative writers to mimic unexpected user behavior.

Evaluate these two strategies. In your response, argue which strategy is more likely to result in a chatbot that generalizes well to a wide range of future, real-world user queries. Justify your conclusion by explaining the potential risks associated with the less effective strategy.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Learn Before

Related