Learn Before
A standard processing block in a Transformer model consists of two main sub-layers applied in sequence. The first sub-layer's primary role is to relate different positions of the input sequence to compute a new representation for each position. The second sub-layer then applies an identical non-linear transformation to each position's representation independently. How does the core computational function, denoted as F(·), implemented within each of these sub-layers, differ?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Scaled Dot-Product Attention
Multi-Head Self-Attention Function
Purpose and Structure of the Feed-Forward Network (FFN) in Transformers
A standard processing block in a Transformer model consists of two main sub-layers applied in sequence. The first sub-layer's primary role is to relate different positions of the input sequence to compute a new representation for each position. The second sub-layer then applies an identical non-linear transformation to each position's representation independently. How does the core computational function, denoted as
F(·), implemented within each of these sub-layers, differ?A standard processing block in a certain neural network architecture consists of two main sub-layers. Each sub-layer's computation can be described as applying a core function,
F(·), within a structure that also includes a residual connection and layer normalization. Match each sub-layer type with the correct description of its core computational function,F(·).Identifying Core Functions in a Transformer Block