1Cademy - Evaluating a Training Strategy for a New Large Model

Learn Before

Architectural Modifications for Trainable LLMs

Case Study

Evaluating a Training Strategy for a New Large Model

Based on the provided scenario, evaluate the team's strategy and identify the most critical oversight that is likely causing these training failures. Justify your reasoning.

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Evaluating a Training Strategy for a New Large Model
Layer Normalization in Transformers
A research team is training a very deep language model based on a standard network design. They observe that as they increase the model's depth, the training process frequently fails with loss values suddenly becoming invalid (NaN). This forces them to restart training repeatedly. Which of the following architectural changes is most specifically designed to mitigate this kind of deep-network training instability?
Rationale for Architectural Changes in Large-Scale Models
Connecting Model Scale and Architectural Design
Omission of Bias Terms in LLM Affine Transformations

Learn Before

Related