A development team fine-tunes a large language model on a custom-built dataset of 50,000 technical support chat logs to improve its ability to resolve customer issues. The fine-tuned model achieves near-perfect accuracy on a test set composed of 5,000 additional logs from the same original source. However, when deployed to handle live customer chats, which include new and unforeseen types of user problems, the model's performance is significantly worse. Based on this scenario, which challenge associated with this improvement method is the most probable cause for the performance drop?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A development team fine-tunes a large language model on a custom-built dataset of 50,000 technical support chat logs to improve its ability to resolve customer issues. The fine-tuned model achieves near-perfect accuracy on a test set composed of 5,000 additional logs from the same original source. However, when deployed to handle live customer chats, which include new and unforeseen types of user problems, the model's performance is significantly worse. Based on this scenario, which challenge associated with this improvement method is the most probable cause for the performance drop?
Prioritizing Challenges in LLM Fine-Tuning
A research lab is working on improving a large language model's ability to solve complex mathematical word problems. Below are descriptions of three distinct problems they encountered during the project. Match each problem description to the most relevant challenge associated with training-based improvement methods.