Learn Before
Transformer Layer Output Formula
The output of a Transformer layer at depth , denoted as , is computed by applying the layer's transformation function to its input, . This relationship can be expressed by the formula: .

0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Transformer Layer Output Formula
General Formula for a Transformer Layer
Input Composition in a Prefix-Tuned Transformer Layer
A language model is processing an input sentence that has been broken down into 5 distinct tokens. The input to the first processing layer is represented as a matrix containing 5 separate vectors, one for each token. Why is it fundamentally important for the model to maintain this structure—a sequence of individual vectors—as the input to each subsequent layer, rather than, for example, averaging or concatenating them into a single vector?
Structure of a Transformer Layer's Input
When a Transformer model processes a sentence with 12 tokens, the input to the fifth layer is a single, high-dimensional vector that represents the aggregated meaning of the entire sentence as computed by the first four layers.
Learn After
An initial input matrix, denoted as , is processed sequentially through three computational layers: , , and . Which expression correctly calculates the output matrix of ?
A computational model processes an initial input matrix, , through three sequential layers. Arrange the following hidden state matrices in the order they are generated.
In a multi-layer computational model, the output of the fifth layer is a matrix of hidden states denoted as . This matrix serves as the input to the sixth layer, which has a transformation function represented as . The output of this sixth layer, , is calculated by the formula: H^6 = \text{____}.