Analyzing Training Instability in a Network Sub-layer
Based on the described sub-layer architecture, explain why the placement of the normalization step might be contributing to the observed training instability. Specifically, how does the computational path affect the gradient flow back to the main branch of the residual connection?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A sub-layer in a neural network processes an input tensor. The sub-layer uses a specific architectural pattern where a residual connection and a normalization step are applied after the main sub-layer function. Arrange the following operations in the correct sequence to compute the final output of this sub-layer.
A sub-layer within a neural network processes an input
x. The design specifies that the output of the sub-layer's main function,F(x), is first added to the original inputx. A normalization function,Norm(·), is then applied to the result of this addition. Which of the following expressions accurately models this computation?Analyzing Training Instability in a Network Sub-layer