Formula

Layer-wise Transformation of Hidden States

In a multi-layer neural network architecture, such as a Transformer, the computation proceeds sequentially through its layers. The output from layer ll, represented by the matrix of hidden states Hl\mathbf{H}^{l}, becomes the input for the subsequent layer, l+1l+1. This transformation is generally expressed by the formula: Hl+1=Layer(Hl)\mathbf{H}^{l+1} = \text{Layer}(\mathbf{H}^{l}) This equation signifies that the hidden states of the next layer are a function of the current layer's hidden states, encapsulating the layer's specific operations (e.g., self-attention, feed-forward network).

Image 0

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences