Learn Before
Choosing a Fine-Tuning Dataset for a Medical Chatbot
A healthcare startup aims to build a chatbot that can answer patient questions about common medications. They have a powerful, pre-trained language model but need to specialize it for this task. They are considering two datasets for the fine-tuning process:
- Dataset A: A comprehensive medical encyclopedia containing thousands of pages of text describing medications, their uses, and side effects.
- Dataset B: A collection of 50,000 question-and-answer pairs, where each question is a common patient query (e.g., 'What are the side effects of ibuprofen?') and the answer is a concise, accurate response.
Which dataset should the startup choose to most effectively train their model for the question-answering task? Justify your choice by explaining how the structure of the selected dataset facilitates the learning process.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A development team fine-tunes a general-purpose, pre-trained language model using a dataset of 1,000 specific question-and-answer pairs related to their new software product. The goal is to create a helpful product support chatbot. Which statement best predicts the model's capability after this fine-tuning process?
Choosing a Fine-Tuning Dataset for a Medical Chatbot
A large language model is fine-tuned exclusively on a dataset containing 50,000 question-answer pairs about world history. After this training, the model will only be able to provide correct answers to those specific 50,000 questions and will fail on any new, unseen history questions.