Interpreting Pre-training Metrics for Specialized Tasks
A data science team has developed two large language models. Model A has a significantly lower test loss on a general web text corpus compared to Model B. The team plans to deploy one of these models for a highly specialized task: generating medical diagnostic reports. Explain why the team should not select a model based solely on the lower test loss and what other factors they must consider.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Task-Specific Nature of Scaling Laws
A research lab pre-trains two language models, Model Alpha and Model Beta, on the same large text corpus. Model Alpha achieves a final test loss of 1.8, while Model Beta achieves a final test loss of 2.5. However, when both models are later adapted for a specialized legal document summarization task, Model Beta significantly outperforms Model Alpha. Which of the following statements provides the most likely explanation for this discrepancy?
Evaluating Model Selection Strategy
Model Selection for a Specialized Task
Interpreting Pre-training Metrics for Specialized Tasks