Formula

General Formula for a Transformer Layer

In a multi-layer model such as a Transformer, the computation proceeds sequentially through its layers. The output of layer ll, which is a sequence of hidden states denoted as Hl\mathbf{H}^{l}, serves as the input to the subsequent layer. The transformation is captured by the general formula: Hl+1=Layer(Hl)\mathbf{H}^{l+1} = \text{Layer}(\mathbf{H}^{l}) This equation indicates that the hidden states for layer l+1l+1 are generated by applying the specific operations of the Layer function (e.g., self-attention, feed-forward network) to the hidden states of layer ll.

Image 0

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences