Short Answer

Identifying Core Functions in a Transformer Block

A standard processing block in a neural network architecture processes an input matrix X through two sequential sub-layers. The computations are as follows:

  1. H' = LayerNorm(X + MultiHeadSelfAttention(X))
  2. Y = LayerNorm(H' + FeedForwardNetwork(H'))

For each of the two steps above, identify the specific function that corresponds to the core computational function, denoted as F(·), and briefly describe its primary purpose.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science