Learn Before
Critique of a Model Training Strategy
Based on the principle that a model's final performance is determined by separate, additive contributions from both model size and dataset size, analyze the potential flaw in the following resource allocation strategy. Explain why this approach might not be optimal for minimizing test loss.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Chinchilla Scaling Law Formula
A research team is training a large language model and observes that the model's performance, measured by test loss, seems to be primarily limited by the number of model parameters rather than the amount of training data. According to the principle that models test loss as a sum of two separate terms—one that decreases as model size grows and another that decreases as dataset size grows—which of the following actions would most effectively reduce the test loss in this specific situation?
Critique of a Model Training Strategy
Analyzing Performance Limits in Language Models