A research team is working to improve a large language model and is using the combined power law, L(N,D) = aN^b + cD^d + ε_∞, to guide their efforts. Their analysis shows that the term aN^b, which depends on the model's parameter count (N), is currently the largest contributor to the total loss. The term cD^d, which depends on the dataset size (D), is comparatively small. To achieve the most significant reduction in loss with their limited resources, what should the team prioritize?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Chinchilla Scaling Law
A research team is working to improve a large language model and is using the combined power law,
L(N,D) = aN^b + cD^d + ε_∞, to guide their efforts. Their analysis shows that the termaN^b, which depends on the model's parameter count (N), is currently the largest contributor to the total loss. The termcD^d, which depends on the dataset size (D), is comparatively small. To achieve the most significant reduction in loss with their limited resources, what should the team prioritize?Diagnosing LLM Training Plateaus
Optimizing LLM Training Strategy