Learn Before
Power-Law Curve of Performance Scaling
A scaling law curve, which plots test error against a variable of interest such as training dataset size, can typically be divided into three phases. At the beginning, test errors decrease slowly for a short period. In the second phase, test errors decrease drastically, forming a power law curve. In the third phase, the reduction in error slows down again as the model encounters irreducible errors that cannot be eliminated regardless of the amount of training data.
0
1
Tags
Foundations of Large Language Models
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
A research team is training a large language model and has a fixed, non-negotiable computational budget. Their goal is to achieve the lowest possible final loss. Based on the established principles that govern the relationship between computation, model size, data size, and performance, which of the following strategies represents the most efficient use of their budget?
Evaluating an LLM Training Strategy
Analyzing Deviations from LLM Scaling Behavior
Continued Effectiveness of Scaling up Training in NLP
Power-Law Curve of Performance Scaling
Scaling Laws Across LLM Development Stages