Learn Before
Predicting Complex Learning Dynamics
A research team is training a large language model and observes that the model's error rate on a validation set first decreases, then briefly increases, and finally decreases again as the training dataset size grows. Explain why a predictive model based on a simple, monotonic power-law function would be unable to accurately forecast this specific performance trend.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Fitting LLM Learning Curves with Diverse Functions
A research team is modeling the performance of a large language model as they increase the amount of training data. Their predictive model, based on a standard power-law function, anticipates a steady, continuous improvement in performance. However, their experiments show that the model's error rate first decreases, then temporarily increases, before decreasing again. Which statement best analyzes the limitation of their predictive model in this context?
Evaluating a Predictive Model for LLM Training
Predicting Complex Learning Dynamics