Multiple Choice

A research team is using the following empirical formula to guide their training strategy for a large language model, where L is the test loss, N is the model size, and D is the dataset size: L(N,D)=406.4N0.34+410.7D0.28+1.69\mathcal{L}(N,D) = \frac{406.4}{N^{0.34}} + \frac{410.7}{D^{0.28}} + 1.69 To achieve the most substantial reduction in test loss, which of the following strategies is predicted by this formula to be more effective?

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science