1Cademy - Choosing a Fine-Tuning Dataset for a Medical Chatbot

Learn Before

Example of SFT: Question-Answering Task

Case Study

Choosing a Fine-Tuning Dataset for a Medical Chatbot

A healthcare startup aims to build a chatbot that can answer patient questions about common medications. They have a powerful, pre-trained language model but need to specialize it for this task. They are considering two datasets for the fine-tuning process:

Dataset A: A comprehensive medical encyclopedia containing thousands of pages of text describing medications, their uses, and side effects.
Dataset B: A collection of 50,000 question-and-answer pairs, where each question is a common patient query (e.g., 'What are the side effects of ibuprofen?') and the answer is a concise, accurate response.

Which dataset should the startup choose to most effectively train their model for the question-answering task? Justify your choice by explaining how the structure of the selected dataset facilitates the learning process.

0

1

Updated 2025-10-05

Contributors are:

Who are from:

Learn Before

Related