Learn Before
Short Answer

Evaluating Pre-training Scenarios

Consider two scenarios:

Scenario A: A team is building a system to classify legal documents into one of 500 highly specific, proprietary categories. They have a massive, well-labeled dataset of 10 million documents.

Scenario B: A team is building a chatbot to answer questions about a new software product. They have a small, curated dataset of only 500 question-and-answer pairs.

Which scenario would benefit more from starting with a large, general-purpose pre-trained model? Justify your answer by explaining the relationship between the amount of available task-specific data and the primary advantage of the pre-training approach.

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science