Short Answer

Analysis of Sub-Layer Computational Flow

A standard sub-layer in a deep neural network processes an input X using a function F and a normalization operation Norm. The output Y is calculated using the formula: Y = Norm(F(X)) + X. Now, consider an alternative formulation: Y_alt = Norm(F(X) + X). Describe the fundamental difference in the order of operations between these two formulas and explain how this change affects what is being normalized within the computational path.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science