1Cademy - A research team is training a very deep language model based on a standard network design. They observe that as they increase the models depth, the training process frequently fails with loss values suddenly becoming invalid (NaN). This forces them to restart training repeatedly. Which of the following architectural changes is most specifically designed to mitigate this kind of deep-network training instability?

Learn Before

Architectural Modifications for Trainable LLMs

Multiple Choice

A research team is training a very deep language model based on a standard network design. They observe that as they increase the model's depth, the training process frequently fails with loss values suddenly becoming invalid (NaN). This forces them to restart training repeatedly. Which of the following architectural changes is most specifically designed to mitigate this kind of deep-network training instability?

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related