Analyzing the Performance Plateau in Model Scaling
Imagine a team of engineers is training a large language model. They observe that after a long period of rapid improvement achieved by adding more and more training data, the model's error rate on a fixed test set has stopped decreasing and has flattened out. Even doubling the training dataset size again results in a negligible improvement. Analyze the fundamental factors that could be contributing to this performance plateau, explaining why simply adding more data is no longer effective.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A research team is developing a language model. They progressively increase the model's size and the amount of training data, observing that performance gains diminish significantly with each increase. The largest model shows almost no improvement over the second-largest, despite being much bigger. What is the most likely reason for this plateau in performance?
Strategic Decision for a Stagnant LLM
Analyzing the Performance Plateau in Model Scaling