Learn Before
Limitations of Monotonic Scaling Functions
The scaling laws commonly used to model LLM performance, such as power laws, are based on monotonic functions. A significant limitation of this approach is that these functions cannot capture more complex learning dynamics that include inflection points, such as the double descent phenomenon.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Limitations of Monotonic Scaling Functions
Limitation of Test Loss in Predicting Downstream Performance
A research team develops a scaling function that accurately predicts their language model's performance on English text as they increase the model's parameter count. Confident in their findings, they use the same function to budget for a new, larger model intended for generating computer code. However, the final code-generation model performs significantly worse than the function predicted. Which statement best explains this outcome?
Evaluating a Compute Budgeting Strategy
A research lab has developed a scaling function that accurately predicts the performance of their specific 10-billion parameter language model on a large corpus of web text. This function can therefore be considered a reliable predictor for the performance of any other 10-billion parameter language model trained on a different large corpus of web text.
Learn After
Fitting LLM Learning Curves with Diverse Functions
A research team is modeling the performance of a large language model as they increase the amount of training data. Their predictive model, based on a standard power-law function, anticipates a steady, continuous improvement in performance. However, their experiments show that the model's error rate first decreases, then temporarily increases, before decreasing again. Which statement best analyzes the limitation of their predictive model in this context?
Evaluating a Predictive Model for LLM Training
Predicting Complex Learning Dynamics