Learn Before
Fitting LLM Learning Curves with Diverse Functions
In response to the inability of simple monotonic functions to capture all aspects of LLM learning, researchers explore more sophisticated and diverse mathematical functions to model training curves. This approach, exemplified in studies by Alabdulmohsin et al. [2022] and Caballero et al. [2023], aims to find better fits for complex phenomena that standard scaling laws miss.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Fitting LLM Learning Curves with Diverse Functions
A research team is modeling the performance of a large language model as they increase the amount of training data. Their predictive model, based on a standard power-law function, anticipates a steady, continuous improvement in performance. However, their experiments show that the model's error rate first decreases, then temporarily increases, before decreasing again. Which statement best analyzes the limitation of their predictive model in this context?
Evaluating a Predictive Model for LLM Training
Predicting Complex Learning Dynamics
Learn After
Power Law Formulation for LLM Loss
A research team observes the performance of a new model as they increase the amount of training data. They plot the test error and notice a distinct pattern: the error first decreases, then temporarily increases, before starting to decrease again. When attempting to create a mathematical formula to predict this error curve, which approach would be most appropriate?
Modeling an Unconventional Learning Curve
Evaluating Modeling Approaches for LLM Learning Curves