Case Study

Choosing a Fine-Tuning Dataset for a Medical Chatbot

A healthcare startup aims to build a chatbot that can answer patient questions about common medications. They have a powerful, pre-trained language model but need to specialize it for this task. They are considering two datasets for the fine-tuning process:

  • Dataset A: A comprehensive medical encyclopedia containing thousands of pages of text describing medications, their uses, and side effects.
  • Dataset B: A collection of 50,000 question-and-answer pairs, where each question is a common patient query (e.g., 'What are the side effects of ibuprofen?') and the answer is a concise, accurate response.

Which dataset should the startup choose to most effectively train their model for the question-answering task? Justify your choice by explaining how the structure of the selected dataset facilitates the learning process.

0

1

Updated 2025-10-05

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science