Learn Before
Analyzing Performance Limits in Language Models
A research team has access to a virtually limitless supply of high-quality training data. They train a language model and find that after training on trillions of tokens, the model's performance on a test set stops improving and plateaus at a certain loss value. Based on the principle that test loss is composed of separate, additive terms for model size, dataset size, and a constant error, what are the two remaining factors that define this performance plateau? Explain why simply adding more data does not lead to further improvement.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Chinchilla Scaling Law Formula
A research team is training a large language model and observes that the model's performance, measured by test loss, seems to be primarily limited by the number of model parameters rather than the amount of training data. According to the principle that models test loss as a sum of two separate terms—one that decreases as model size grows and another that decreases as dataset size grows—which of the following actions would most effectively reduce the test loss in this specific situation?
Critique of a Model Training Strategy
Analyzing Performance Limits in Language Models