Short Answer

Connecting Model Scale and Architectural Design

A machine learning team successfully trained a 1-billion-parameter language model using a standard network architecture. When they scale the exact same architecture up to 100 billion parameters and begin training with a proportionally larger dataset, they find the training process repeatedly fails. Based on the principles of large-scale model training, explain the most likely reason for this discrepancy in training stability between the two model sizes.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science