Learn Before
Selecting a Fine-Tuning Dataset for a Customer Support Chatbot
A software company wants to fine-tune a general-purpose language model to create a chatbot that helps users troubleshoot common software issues. They have two datasets available:
- Dataset A: The complete user manual and all official technical documentation for the software, formatted as a large collection of articles and guides.
- Dataset B: A collection of 50,000 anonymized transcripts from past support chats between human agents and customers, formatted as a series of "User:" and "Agent:" turns.
Which dataset should the company primarily use to fine-tune their model to be an effective, conversational support chatbot? Justify your choice by explaining how the structure and content of the selected dataset contribute to the desired chatbot behavior.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Ch.4 Alignment - Foundations of Large Language Models
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A development team aims to create a helpful technical support chatbot. They train a general-purpose language model on a large dataset consisting solely of their product's technical manuals. When tested, the model provides factually correct information but fails to engage in natural, back-and-forth conversation. Which of the following changes to the training data is most likely to improve the chatbot's conversational ability?
Selecting a Fine-Tuning Dataset for a Customer Support Chatbot
Crafting Training Data for a Specialized Chatbot