Case Study

Selecting a Fine-Tuning Dataset for a Customer Support Chatbot

A software company wants to fine-tune a general-purpose language model to create a chatbot that helps users troubleshoot common software issues. They have two datasets available:

  • Dataset A: The complete user manual and all official technical documentation for the software, formatted as a large collection of articles and guides.
  • Dataset B: A collection of 50,000 anonymized transcripts from past support chats between human agents and customers, formatted as a series of "User:" and "Agent:" turns.

Which dataset should the company primarily use to fine-tune their model to be an effective, conversational support chatbot? Justify your choice by explaining how the structure and content of the selected dataset contribute to the desired chatbot behavior.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course

Ch.4 Alignment - Foundations of Large Language Models

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science