1Cademy - Identifying Core Functions in a Transformer Block

Learn Before

Core Function $F(\cdot)$ in Transformer Sub-layers

Short Answer

Identifying Core Functions in a Transformer Block

A standard processing block in a neural network architecture processes an input matrix X through two sequential sub-layers. The computations are as follows:

H' = LayerNorm(X + MultiHeadSelfAttention(X))
Y = LayerNorm(H' + FeedForwardNetwork(H'))

For each of the two steps above, identify the specific function that corresponds to the core computational function, denoted as F(·), and briefly describe its primary purpose.

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences