1Cademy - General Formula for a Transformer Layer

Learn Before

Input Representation in a Transformer Layer

Formula

General Formula for a Transformer Layer

In a multi-layer model such as a Transformer, the computation proceeds sequentially through its layers. The output of layer $l$ , which is a sequence of hidden states denoted as $\mathbf{H}^{l}$ , serves as the input to the subsequent layer. The transformation is captured by the general formula: $\mathbf{H}^{l+1} = \text{Layer}(\mathbf{H}^{l})$ This equation indicates that the hidden states for layer $l+1$ are generated by applying the specific operations of the Layer function (e.g., self-attention, feed-forward network) to the hidden states of layer $l$ .