Case Study

Data Generation Strategy for a Specialized AI Assistant

A startup is building an AI assistant to provide technical support for a complex software product. They have a limited budget for creating the data needed to train their model. They are considering two options:

  1. Hiring a small team of expert software engineers to manually write 5,000 high-quality question-and-answer pairs.
  2. Using a powerful, general-purpose language model to automatically generate 100,000 question-and-answer pairs based on the software's documentation.

Evaluate the two options. Which strategy would you recommend for the startup? Justify your recommendation by analyzing the key trade-offs between the two approaches regarding data scale, cost, and potential quality.

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science