1Cademy - Diagnosing a Flawed Fine-Tuning Dataset

Learn Before

Generating Fine-Tuning Data with Crowdsourced Questions and LLM-Generated Answers

Case Study

Diagnosing a Flawed Fine-Tuning Dataset

Based on the following case study, evaluate the startup's data generation strategy. What is the most likely flaw in their process that led to the chatbot's poor performance?

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

A company is building a specialized chatbot to provide users with reliable legal information. To create the training data, the team first gathers a large set of legal questions from the general public via an online platform. Next, they use a highly advanced, general-purpose language model to generate answers to all of these questions. These question-answer pairs are then used to fine-tune their new chatbot. Which of the following describes the most significant risk inherent in this specific data creation method?
AI Tutor Data Generation Strategy
Diagnosing a Flawed Fine-Tuning Dataset

Learn Before

Related